Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex Compute Cloud
  • Yandex Container Solution
    • All tutorials
    • Configuring time synchronization using NTP
    • Autoscaling an instance group to process messages from a queue
    • Updating an instance group under load
    • Deploying Remote Desktop Gateway
    • Getting started with Packer
    • Transferring logs from a VM to Yandex Cloud Logging
    • Building a VM image with infrastructure tools using Packer
    • Migrating data to Yandex Cloud using Hystax Acura
    • Fault protection with Hystax Acura
    • VM backups using Hystax Acura
    • Deploying a fault-tolerant architecture with preemptible VMs
    • Configuring a fault-tolerant architecture in Yandex Cloud
    • Creating a budget trigger that invokes a function to stop a VM
    • Creating triggers that invoke a function to stop a VM and send a Telegram notification
    • Creating a Python web application with Flask
    • Creating an SAP program in Yandex Cloud
    • Deploying a Minecraft server in Yandex Cloud
    • Automating image builds using Jenkins and Packer
    • Creating test VMs via GitLab CI
    • High-performance computing on preemptible VMs
    • Configuring an SFTP server based on CentOS 7
    • Deploying GlusterFS in high availability mode
    • Deploying GlusterFS in high performance mode
    • Backing up to Object Storage with Bacula
    • Building a CI/CD pipeline in GitLab using serverless products
    • Implementing a secure high-availability network infrastructure with a dedicated DMZ based on the Check Point NGFW
    • Cloud infrastructure segmentation with the Check Point next-generation firewall
    • Configuring a secure GRE tunnel over IPsec
    • Creating a bastion host
    • Implementing fault-tolerant scenarios for NAT VMs
    • Creating a tunnel between two subnets using OpenVPN Access Server
    • Creating an external table from a Object Storage bucket table using a configuration file
    • Setting up network connectivity between BareMetal and Virtual Private Cloud subnets
    • Working with snapshots in Managed Service for Kubernetes
    • Launching the DeepSeek-R1 language model in a Yandex Compute Cloud GPU cluster
  • Access management
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Release notes

In this article:

  • Get your cloud ready
  • Required paid resources
  • Create a GPU cluster with two VMs .
  • Create a GPU cluster
  • Add two VMs to the cluster
  • Test cluster state
  • Run the language model
  • Test the language model performance
  • How to delete the resources you created
  1. Tutorials
  2. Launching the DeepSeek-R1 language model in a Yandex Compute Cloud GPU cluster

Launching the DeepSeek-R1 language model in a Yandex Compute Cloud GPU cluster

Written by
Yandex Cloud
Updated at May 7, 2025
  • Get your cloud ready
    • Required paid resources
  • Create a GPU cluster with two VMs .
    • Create a GPU cluster
    • Add two VMs to the cluster
  • Test cluster state
  • Run the language model
  • Test the language model performance
  • How to delete the resources you created

Note

Currently, GPU clusters are only available in the ru-central1-a and ru-central1-d availability zones. You can only add a VM to a GPU cluster from the same availability zone.

In this tutorial, you will create a GPU cluster with two VMs to run the DeepSeek-R1 language model.

To run a language model in a GPU cluster:

  1. Get your cloud ready.
  2. Create a GPU cluster with two VMs.
  3. Test cluster state.
  4. Run the language model.
  5. Test the language model performance.

If you no longer need the resources you created, delete them.

Get your cloud readyGet your cloud ready

Sign up in Yandex Cloud and create a billing account:

  1. Navigate to the management console and log in to Yandex Cloud or register a new account.
  2. On the Yandex Cloud Billing page, make sure you have a billing account linked and it has the ACTIVE or TRIAL_ACTIVE status. If you do not have a billing account, create one and link a cloud to it.

If you have an active billing account, you can navigate to the cloud page to create or select a folder for your infrastructure to operate in.

Learn more about clouds and folders.

Required paid resourcesRequired paid resources

The infrastructure support costs include:

  • Fee for continuously running VMs and disks (see Yandex Compute Cloud pricing).

Create a GPU cluster with two VMs .Create a GPU cluster with two VMs .

Create a GPU clusterCreate a GPU cluster

Management console
  1. In the management console, select the folder where you want to create a cluster.
  2. From the list of services, select Compute Cloud.
  3. In the left-hand panel, select GPU clusters.
  4. Click Create a cluster.
  5. In the Name field, enter cluster name: test-gpu-cluster.
  6. In the Availability zone field, select the ru-central1-d availability zone.
  7. Click Save.

Add two VMs to the clusterAdd two VMs to the cluster

  1. Create the first VM:

    Management console
    1. In the left-hand panel, select Virtual machines.

    2. Click Create virtual machine.

    3. Under Boot disk image, select the Ubuntu 20.04 LTS Secure Boot CUDA 12.2 public image.

    4. In the Availability zone field, select the ru-central1-d availability zone.

    5. Under Disks and file storages, select the SSD disk type and specify its size: 800 GB.

    6. Under Computing resources, navigate to the Custom tab and specify the platform, number of GPUs, and cluster:

      • Platform: AMD Epyc 9474F with Gen2.
      • GPU: 8.
      • GPU cluster: Select the previously created test-gpu-cluster cluster.
    7. Under Access, select SSH key and specify the VM access credentials:

      • In the Login field, enter a username, e.g., ubuntu. Do not use root or other names reserved for the OS purposes. To perform operations requiring root privileges, use the sudo command.
      • In the SSH key field, select the SSH key saved in your organization user profile.

        If there are no saved SSH keys in your profile, or you want to add a new key:

        • Click Add key.
        • Enter a name for the SSH key.
        • Upload or paste the contents of the public key file. You need to create a key pair for the SSH connection to a VM yourself.
        • Click Add.

        The SSH key will be added to your organization user profile.

        If users cannot add SSH keys to their profiles in the organization, the added public SSH key will only be saved to the user profile of the VM being created.

    8. Click Create VM.

  2. Similarly, create the second VM.

Test cluster stateTest cluster state

Optionally, you can:

  • Test cluster physical state.
  • Run parallel jobs in the cluster.
  • Test InfiniBand throughput.

Run the language modelRun the language model

  1. Connect to both VMs over SSH.

  2. Add the current user to the docker group:

    sudo groupadd docker
    sudo usermod -aG docker $USER
    newgrp docker
    
  3. Pull the SGLang image to both VMs:

    docker pull lmsysorg/sglang:latest
    
  4. Run this command on the first VM:

    docker run --gpus all --device=/dev/infiniband --ulimit memlock=-1 --ulimit stack=67108864 --shm-size 32g --network=host -v ~/.cache/huggingface:/root/.cache/huggingface --name sglang_multinode1 -e GLOO_SOCKET_IFNAME=eth0 -it --rm --ipc=host lmsysorg/sglang:latest python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1 --tp 16 --nccl-init-addr <IP_address_1>:30000 --nnodes 2 --node-rank 0 --trust-remote-code --host 0.0.0.0 --port 30001 --disable-radix --max-prefill-tokens 126000
    

    Where IP_address_1 is the first VM internal IP address.

  5. Run this command on the second VM:

    docker run --gpus all --device=/dev/infiniband --ulimit memlock=-1 --ulimit stack=67108864 --shm-size 32g --network=host -v ~/.cache/huggingface:/root/.cache/huggingface --name sglang_multinode2 -e GLOO_SOCKET_IFNAME=eth0 -it --rm --ipc=host lmsysorg/sglang:latest python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1 --tp 16 --nccl-init-addr <IP_address_1>:30000 --nnodes 2 --node-rank 1 --trust-remote-code --host 0.0.0.0 --port 30001 --disable-radix --max-prefill-tokens 126000
    

    Where IP_address_1 is the first VM internal IP address.

  6. Wait for the server to start:

    The server is fired up and ready to roll!
    

Test the language model performanceTest the language model performance

  1. In a new session, connect to the first VM over SSH.

  2. Install the openai package:

    sudo apt update
    sudo apt install python3-pip
    pip install openai
    
  3. Create a test_model.py script with the following contents:

    import openai
    client = openai.Client(
       base_url="http://127.0.0.1:30001/v1", api_key="EMPTY")
    
    response = client.chat.completions.create(
       model="default",
       messages=[
          {"role": "system", "content": "You are a helpful AI assistant"},
          {"role": "user", "content": "List 3 countries and their capitals."},
       ],
       temperature=0.3,
       max_tokens=1024,
    )
    print(response.choices[0].message.content)
    
  4. Run the script:

    python3 test_model.py
    

    Model response example:

    Here are three countries and their capitals:
    
    1. **France** - Paris
    2. **Japan** - Tokyo
    3. **Brazil** - Brasília
    
    Let me know if you'd like more examples! 😊
    

How to delete the resources you createdHow to delete the resources you created

To stop paying for the resources you created:

  1. Delete the VM instances in Compute Cloud.
  2. Delete the GPU cluster in Compute Cloud.

Was the article helpful?

Previous
Deploying GitLab Runner on a virtual machine
Next
Resource relationships
© 2025 Direct Cursus Technology L.L.C.