Connecting to a cluster Apache Spark™
Written by
Updated at December 3, 2025
This section presents settings for connection to a Yandex Managed Service for Apache Spark™ cluster via Spark Connect
Connecting via Spark Connect
-
Create an IAM token and save it to the environment variable:
export TOKEN=$(yc iam create-token) -
Create a SparkConnect job without specifying any parameters in your cluster.
-
Copy the Spark Connect Server endpoint of the new connection job.
You can get the endpoint with the job information. Its value is specified in the Connection URL field in the management console or in the
connect_urlfield in the CLI and API. -
Install the
pysparkpackage and relevant dependencies in your environment using the pip package manager.Note
Currently, only connection with PySpark
3.5.6is supported. -
Run the code for connection to the cluster:
import os from pyspark.sql import SparkSession url_spark = "<cluster_connecton_endpoint>" TOKEN = os.environ.get("TOKEN") spark = SparkSession.builder.remote(f"{url_spark}/;use_ssl=true;token={TOKEN}").getOrCreate() df = spark.createDataFrame([(1, "Sarah"), (2, "Maria")]).toDF(*["id", "name"]) df.show()Result:
+---+-----+ | id| name| +---+-----+ | 1|Sarah| | 2|Maria| +---+-----+