Multimodal models
Written by
Updated at May 13, 2025
Foundation Models provides access to large vision language models that allow you to generate texts from a text prompt and image. These models are available in batch mode.
To access a model, use the standard model URI: gpt://<folder_ID>/<model_name>/<branch>
.
Model | URI | Context |
---|---|---|
Qwen2 VL 7BModel card |
gpt://<folder_ID>/qwen2-vl-7b-instruct/ |
4096 |
Qwen2.5 VL 7BModel card |
gpt://<folder_ID>/qwen2.5-vl-7b-instruct/ |
4096 |
Qwen 2.5 VL 32B InstructModel card |
gpt://<folder_ID>/qwen2.5-vl-32b-instruct/ |
4096 |
DeepSeek 2 VLModel card |
gpt://<folder_ID>/deepseek-vl2/ |
4096 |
DeepSeek 2 VL TinyModel card |
gpt://<folder_ID>/deepseek-vl2-tiny/ |
4096 |
Gemma3 4B itModel card |
gpt://<folder_ID>/gemma-3-4b-it/ |
4096 |
Gemma3 12B itModel card |
gpt://<folder_ID>/gemma-3-12b-it/ |
4096 |
Gemma3 27B itModel card |
gpt://<folder_ID>/gemma-3-27b-it/ |
4096 |