Connecting to a Yandex Data Processing host from graphical IDEs
You can connect to a Yandex Data Processing cluster using graphical IDEs.
Before connecting:
Connect using graphical IDEs
Connections were tested in the following environment:
- Ubuntu 20.04, DBeaver:
22.2.4 - MacOS Monterey 12.7:
- JetBrains DataGrip:
2023.3.4 - DBeaver Community:
24.0.0
- JetBrains DataGrip:
To use graphical IDEs, save a certificate
- Create a data source:
-
Select File → New → Data Source → Apache Hive.
Note
Select the data source depending on the Yandex Data Processing component you are connecting to:
- Hive: Select Apache Hive.
- Spark: Select Apache Spark.
The list of settings does not change.
-
Specify the connection settings on the General tab:
- Host: FQDN of the cluster master host or its public IP address.
- If connecting for the first time, click Download to download the connection driver.
-
On the SSH/SSL tab:
- Enable the Use SSL setting and specify the SSL connection settings:
- CA file: Downloaded SSL certificate for the connection.
- Client key file, Client key password: File with the private key required to connect to the Yandex Data Processing cluster and its password.
- Optionally, to connect via a jump host VM, configure the SSH tunnel settings:
- Select Use SSH tunnel, create an SSH configuration, and specify these settings:
- Host: VM IP address.
- User name: VM username.
- Private key file, Passphrase: Private key file required to connect to the VM and its password.
- Click Test Connection to test the connection to the VM from DataGrip.
- Click OK to save the configuration.
- Select Use SSH tunnel, create an SSH configuration, and specify these settings:
- Enable the Use SSL setting and specify the SSL connection settings:
-
- Click Test Connection. If the connection is successful, you will get the OK connection status and information about the DBMS and driver.
- Click OK to save the data source.
- Download the SSH key to the local machine or VM to connect to a Yandex Data Processing cluster.
- Create a new DB connection:
-
From the Database menu, select New connection.
-
Select a data source from the DB list depending on the configuration of the Yandex Data Processing cluster you are connecting to:
- If the cluster uses Hive, select Apache Hive.
- If only Spark is enabled in the cluster and the Thrift server is enabled, select Apache Spark.
The list of connection settings remains the same regardless of the selected data source.
-
Click Next.
-
On the SSH tab, enable the Use SSH tunnel setting and specify these settings:
- Host/IP: FQDN (to connect via a jump host VM) or public IP address of the master host.
- Username: Enter the username:
- For version 2.0:
ubuntu. - For version 1.4:
root.
- For version 2.0:
- Authentication method:
Public key. - Secret key: Path to the cluster’s private key file.
- Passphrase: Private key password.
- Optionally, to connect via a jump host VM, enable the Use jump server setting and specify the settings:
- Host/IP: Public IP address of the VM for connection.
- Username: Username for connecting to the VM.
- Authentication method:
Public key. - Secret key: Path to the VM’s private key file.
- Passphrase: Private key password.
-
- Click Test Connection .... If the connection is successful, you will see the connection status and information about the DBMS and driver.
- Click Ready to save the database connection settings.