Connecting to a cluster Apache Spark™

Written by

Updated at December 3, 2025

This section presents settings for connection to a Yandex Managed Service for Apache Spark™ cluster via Spark Connect.

Connecting via Spark Connect

Create an IAM token and save it to the environment variable:
```
export TOKEN=$(yc iam create-token)
```
Create a SparkConnect job without specifying any parameters in your cluster.
Copy the Spark Connect Server endpoint of the new connection job.

You can get the endpoint with the job information. Its value is specified in the Connection URL field in the management console or in the connect_url field in the CLI and API.
Install the pyspark package and relevant dependencies in your environment using the pip package manager.

Note

Currently, only connection with PySpark 3.5.6 is supported.

Run the code for connection to the cluster:

import os
from pyspark.sql import SparkSession

url_spark = "<cluster_connecton_endpoint>"
TOKEN = os.environ.get("TOKEN")

spark = SparkSession.builder.remote(f"{url_spark}/;use_ssl=true;token={TOKEN}").getOrCreate()
df = spark.createDataFrame([(1, "Sarah"), (2, "Maria")]).toDF(*["id", "name"])
df.show()

Result:

+---+-----+
| id| name|
+---+-----+
|  1|Sarah|
|  2|Maria|
+---+-----+

Connecting to a cluster Apache Spark™

Connecting via Spark ConnectConnecting via Spark Connect

Was the article helpful?

Connecting via Spark Connect