Yandex Cloud
Search
Discuss with expertTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
  • Marketplace
    • Featured
    • Infrastructure & Network
    • Data Platform
    • AI for business
    • Security
    • DevOps tools
    • Serverless
    • Monitoring & Resources
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2026 Direct Cursus Technology L.L.C.
Tutorials
    • All tutorials
    • Architecture and protection of a basic web service
    • Cost analysis by resource using Object Storage
    • Obtaining the information you need to request the Russian Ministry of Digital Development to whitelist a resource
      • Getting started with Terraform
      • Terraform data sources
      • Uploading Terraform states to Object Storage
      • Getting started with Packer
      • Building a VM image with infrastructure tools using Packer
      • Locking Terraform states using Managed Service for YDB
      • Using Yandex Cloud modules in Terraform
      • Managing Kubernetes resources via the Terraform provider
      • Creating a VM and an instance group with a Container Optimized Image using Terraform
      • Transferring logs through Unified Agent HTTP input to Cloud Logging
      • Running the DeepSeek-R1 language model in a Compute Cloud GPU cluster

In this article:

  • Get your cloud ready
  • Required paid resources
  • Create a GPU cluster with two VMs .
  • Create a GPU cluster
  • Add your VM to a cluster
  • Optionally, check the cluster state
  • Run the language model
  • Test the language model performance
  • How to delete the resources you created
  1. Basic infrastructure
  2. Tools
  3. Running the DeepSeek-R1 language model in a Compute Cloud GPU cluster

Running the DeepSeek-R1 language model in a Yandex Compute Cloud GPU cluster

Written by
Yandex Cloud
Updated at June 15, 2026
  • Get your cloud ready
    • Required paid resources
  • Create a GPU cluster with two VMs .
    • Create a GPU cluster
    • Add your VM to a cluster
  • Optionally, check the cluster state
  • Run the language model
  • Test the language model performance
  • How to delete the resources you created

Note

Currently, GPU clusters are only available in the ru-central1-a and ru-central1-d availability zones. You can only add a virtual machine (VM) to a GPU cluster from the same availability zone.

In this tutorial, you will create a GPU cluster with two VMs to run the DeepSeek-R1 language model.

To run a language model in a cluster:

  1. Get your cloud ready.
  2. Create a GPU cluster with two VMs.
  3. Test cluster state.
  4. Run the language model.
  5. Test the model.

If you no longer need the resources you created, delete them.

Get your cloud readyGet your cloud ready

Sign up for Yandex Cloud and create a billing account:

  1. Navigate to the management console and log in to Yandex Cloud or create a new account.
  2. On the Yandex Cloud Billing page, make sure you have a billing account linked and it has the ACTIVE or TRIAL_ACTIVE status. If you do not have a billing account, create one and link a cloud to it.

If you have an active billing account, you can create or select a folder for your infrastructure on the cloud page.

Learn more about clouds and folders here.

Make sure the cloud has enough quotas for the total number of GPU clusters, total number of Gen2 GPUs, amount of RAM, number of vCPUs, and the SSD size to create the VMs. To check your quotas, use Yandex Cloud Quota Manager.

Required paid resourcesRequired paid resources

The infrastructure support cost includes a fee for VM computing resources and disks, as well as for the GPU cluster (see Yandex Compute Cloud pricing).

Create a GPU cluster with two VMs .Create a GPU cluster with two VMs .

Create a GPU clusterCreate a GPU cluster

Management console
  1. In the management console, select a folder to create your cluster in.
  2. Go to Compute Cloud.
  3. In the left-hand panel, select GPU clusters.
  4. Click Create a cluster.
  5. In the Name field, specify test-gpu-cluster.
  6. In the Availability zone field, select ru-central1-d.
  7. Click Save.

Add your VM to a clusterAdd your VM to a cluster

  1. Create your first VM:

    Management console
    1. In the left-hand panel, select Virtual machines and click Create virtual machine.

    2. Under Boot disk image, select the Ubuntu 20.04 LTS Secure Boot CUDA 12.2 public image.

    3. In the Availability zone field, select ru-central1-d.

    4. Under Disks and file storages, select the SSD disk type and specify its size: 800 GB.

    5. Under Computing resources, navigate to the Custom tab and specify:

      • Platform: Gen2.
      • GPU: 8.
      • GPU cluster: Select the test-gpu-cluster cluster you created earlier.
    6. Under Access, select SSH key and specify the access credentials:

      • Login: ubuntu.
      • In the SSH key field, select the SSH key saved in your organization user profile.

        If there are no SSH keys in your profile or you want to add a new key:

        1. Click Add key.

        2. Enter a name for the SSH key.

        3. Select one of the following:

          • Enter manually: Paste the contents of the public SSH key. You need to create an SSH key pair on your own.

          • Load from file: Upload the public part of the SSH key. You need to create an SSH key pair on your own.

          • Generate key: Automatically create an SSH key pair.

            When adding a new SSH key, an archive containing the key pair will be created and downloaded. In Linux or macOS-based operating systems, unpack the archive to the /home/<user_name>/.ssh directory. In Windows, unpack the archive to the C:\Users\<user_name>/.ssh directory. You do not need additionally enter the public key in the management console.

        4. Click Add.

        The system will add the SSH key to your organization user profile. If the organization has disabled the ability for users to add SSH keys to their profiles, the added public SSH key will only be saved in the user profile inside the newly created resource.

    7. Click Create VM.

  2. Similarly, create a second VM with the same settings.

Optionally, check the cluster stateOptionally, check the cluster state

Also, you can:

  • Test the cluster physical state.
  • Run parallel jobs.
  • Test InfiniBand throughput.

Run the language modelRun the language model

  1. Connect to both VMs over SSH.

  2. Add the ubuntu user to the docker group by running these commands on both VMs:

    sudo groupadd docker
    sudo usermod -aG docker $USER
    newgrp docker
    
  3. Pull the SGLang image to both VMs:

    docker pull lmsysorg/sglang:latest
    
  4. On the first VM, run the server start command (replace <IP_address_1> with the first VM's internal IP):

    docker run --gpus all \
      --device=/dev/infiniband \
      --ulimit memlock=-1 \
      --ulimit stack=67108864 \
      --shm-size 32g \
      --network=host \
      -v ~/.cache/huggingface:/root/.cache/huggingface \
      --name sglang_multinode1 \
      -e GLOO_SOCKET_IFNAME=eth0 \
      -it --rm --ipc=host lmsysorg/sglang:latest \
      python3 -m sglang.launch_server \
      --model-path deepseek-ai/DeepSeek-R1 \
      --tp 16 \
      --nccl-init-addr <IP_address_1>:30000 \
      --nnodes 2 \
      --node-rank 0 \
      --trust-remote-code \
      --host 0.0.0.0 \
      --port 30001 \
      --disable-radix \
      --max-prefill-tokens 126000
    
  5. On the second VM, run the same command with --node-rank set to 1:

    docker run --gpus all \
      --device=/dev/infiniband \
      --ulimit memlock=-1 \
      --ulimit stack=67108864 \
      --shm-size 32g \
      --network=host \
      -v ~/.cache/huggingface:/root/.cache/huggingface \
      --name sglang_multinode2 \
      -e GLOO_SOCKET_IFNAME=eth0 \
      -it --rm --ipc=host lmsysorg/sglang:latest \
      python3 -m sglang.launch_server \
      --model-path deepseek-ai/DeepSeek-R1 \
      --tp 16 \
      --nccl-init-addr <IP_address_1>:30000 \
      --nnodes 2 \
      --node-rank 1 \
      --trust-remote-code \
      --host 0.0.0.0 \
      --port 30001 \
      --disable-radix \
      --max-prefill-tokens 126000
    
  6. Wait for the message saying your start was successful: The server is fired up and ready to roll!.

Test the language model performanceTest the language model performance

  1. Open a new SSH session to the first VM.

  2. Install the OpenAI library:

    sudo apt update
    sudo apt install python3-pip -y
    pip install openai
    
  3. Create a script named test_model.py with this code:

    import openai
    
    client = openai.Client(
       base_url="http://127.0.0.1:30001/v1",
       api_key="EMPTY"
    )
    
    response = client.chat.completions.create(
       model="default",
       messages=[
          {"role": "system", "content": "You are a helpful AI assistant"},
          {"role": "user", "content": "List 3 countries and their capitals."},
       ],
       temperature=0.3,
       max_tokens=1024,
    )
    
    print(response.choices[0].message.content)
    
  4. Run the script:

    python3 test_model.py
    

    Sample response:

    Here are three countries and their capitals:
    
    1. **France** - Paris
    2. **Japan** - Tokyo
    3. **Brazil** - Brasília
    
    Let me know if you'd like more examples! 😊
    

How to delete the resources you createdHow to delete the resources you created

To stop paying for the resources you created, in Compute Cloud:

  1. Delete the VMs you created.
  2. Delete the GPU cluster you created.

See alsoSee also

Questions about GPUs

Was the article helpful?

Previous
Transferring logs through Unified Agent HTTP input to Cloud Logging
Next
Configuring time synchronization using NTP
© 2026 Direct Cursus Technology L.L.C.