Yandex Cloud
Search
Contact UsTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • AI for business
    • Security
    • DevOps tools
    • Serverless
    • Monitoring & Resources
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex Managed Service for Apache Spark™
  • Getting started
    • All guides
      • Information about existing clusters
      • Creating a cluster
      • Connecting to a cluster
      • Updating a cluster
      • Stopping and starting a cluster
      • Deleting a cluster
  • Access management
  • Pricing policy
  • Yandex Monitoring metrics
  • Terraform reference
  • Release notes
  1. Step-by-step guides
  2. Clusters
  3. Connecting to a cluster

Connecting to a cluster Apache Spark™

Written by
Yandex Cloud
Updated at December 3, 2025

This section presents settings for connection to a Yandex Managed Service for Apache Spark™ cluster via Spark Connect.

Connecting via Spark ConnectConnecting via Spark Connect

  1. Create an IAM token and save it to the environment variable:

    export TOKEN=$(yc iam create-token)
    
  2. Create a SparkConnect job without specifying any parameters in your cluster.

  3. Copy the Spark Connect Server endpoint of the new connection job.

    You can get the endpoint with the job information. Its value is specified in the Connection URL field in the management console or in the connect_url field in the CLI and API.

  4. Install the pyspark package and relevant dependencies in your environment using the pip package manager.

    Note

    Currently, only connection with PySpark 3.5.6 is supported.

  5. Run the code for connection to the cluster:

    import os
    from pyspark.sql import SparkSession
    
    url_spark = "<cluster_connecton_endpoint>"
    TOKEN = os.environ.get("TOKEN")
    
    spark = SparkSession.builder.remote(f"{url_spark}/;use_ssl=true;token={TOKEN}").getOrCreate()
    df = spark.createDataFrame([(1, "Sarah"), (2, "Maria")]).toDF(*["id", "name"])
    df.show()
    

    Result:

    +---+-----+
    | id| name|
    +---+-----+
    |  1|Sarah|
    |  2|Maria|
    +---+-----+
    

Was the article helpful?

Previous
Creating a cluster
Next
Updating a cluster
© 2025 Direct Cursus Technology L.L.C.