Multimodal models

Written by

Updated at May 13, 2025

Foundation Models provides access to large vision language models that allow you to generate texts from a text prompt and image. These models are available in batch mode.

To access a model, use the standard model URI: gpt://<folder_ID>/<model_name>/<branch>.

Model	URI	Context
Qwen2 VL 7B Model card Apache 2.0 license	`gpt://<folder_ID>/qwen2-vl-7b-instruct/`	4096
Qwen2.5 VL 7B Model card Apache 2.0 license	`gpt://<folder_ID>/qwen2.5-vl-7b-instruct/`	4096
Qwen 2.5 VL 32B Instruct Model card Apache 2.0 license	`gpt://<folder_ID>/qwen2.5-vl-32b-instruct/`	4096
DeepSeek 2 VL Model card DeepSeek license	`gpt://<folder_ID>/deepseek-vl2/`	4096
DeepSeek 2 VL Tiny Model card DeepSeek license	`gpt://<folder_ID>/deepseek-vl2-tiny/`	4096
Gemma3 4B it Model card Gemma Terms of Use	`gpt://<folder_ID>/gemma-3-4b-it/`	4096
Gemma3 12B it Model card Gemma Terms of Use	`gpt://<folder_ID>/gemma-3-12b-it/`	4096
Gemma3 27B it Model card Gemma Terms of Use	`gpt://<folder_ID>/gemma-3-27b-it/`	4096

Multimodal models

Was the article helpful?