Working with Visual Studio Code jobs
The DataSphere Jobs Toolkit extension allows you to work with DataSphere jobs from within the Visual Studio Code
The DataSphere Jobs Toolkit uses the DataSphere CLI utility to analyze the environment and identify dependencies. It is automatically installed in the current virtual environment when you run your first job.
Note
To use the extension, make sure you have installed the Visual Studio Code
When you run a job, the datasphere
library analyzes the environment, collects code dependencies, and can provide them to DataSphere for deploying the environment on a cloud VM. To avoid unnecessary system dependencies that can affect a job's performance, we recommend using a virtual environment, such as venv
Note
By default, DataSphere uses the conda package manager with pre-installed Python 3.10 to run jobs. To reduce environment migration time, use the same Python version for your jobs.
Install the DataSphere Jobs Toolkit extension
- Open the extension page on the marketplace
. - Click Install.
- The browser will prompt you to open Visual Studio Code. Click Open app.
- On the DataSphere Jobs Toolkit page that opens in Visual Studio Code, click Install.
Once the extension is successfully installed, you will see This extension is enabled globally on the extension page.
Authenticate in DataSphere Jobs
To work with DataSphere Jobs, authenticate with a Yandex or federated account.
-
Get an OAuth token.
An OAuth token lives 12 months. After that, you need to get a new one
and get authenticated again. -
In Visual Studio Code, click Search at the top of the window and enter this command:
> DataSphere Jobs: Set OAuth token
. In the window that opens, click Open to get an OAuth token.If you already have an OAuth token, click Cancel.
-
In the Input OAuth-token field that opens, enter the OAuth token.
To delete the OAuth token, use this command: > DataSphere Jobs: Remove OAuth token
.
To authenticate a federated account, you need to install and configure the YC CLI. If the YC CLI is installed in the default folder or the path to it is specified in the Path
environment variable, the extension will detect it automatically. If not, specify the YC CLI executable path in the extension settings:
-
In the left-hand panel, select DataSphere Jobs and click Settings.
-
In the Datasphere: Yandex Cloud Console Path field, enter the path to the YC CLI executable, e.g.:
C:\Users\<username>\yandex-cloud\bin\yc.exe
Run the job
-
In the left-hand panel, select
DataSphere Jobs and click Settings. -
In the Datasphere: Project field, enter the project ID. To get the project ID, click the ID icon on your project page in the DataSphere interface
. -
Open the Python file using DataSphere Jobs Toolkit. To do this, in the Visual Studio Code Explorer, right-click the file you need and select Run File in DataSphere Jobs.
You can also pre-open the file with the job code in Visual Studio Code, click Run and Debug in the top-right corner of the edit window, and select Run File in DataSphere Jobs.
You can use this code sample when testing the extension. You can find other job run examples in the repository
on GitHub. -
In the DataSphere Job Configuration window that opens, set the following configuration:
BASICADVANCEDENVIRONMENTPYTHONConfigure the main run parameters that match the values in the
config.yaml
file:- Working directory: Working directory containing files required to run the job.
- Inputs: Files with input data in
<path>=<variable_name>
format. Specify each value pair in a separate line. - Outputs: Files with output data in
<path>=<variable_name>
format. Specify each value pair in a separate line.
Configure additional parameters:
- Project Identifier: DataSphere project ID.
- Configuration file path: Path to the completed
config.yaml
configuration file. - Instance Type: Configuration parameters of the VM on which the job will run.
- S3 mounts: IDs of S3 connectors. In case of multiple S3 connectors, specify each ID in a separate line.
- Datasets: Datasets in
<dataset>=<variable_name>
format. Specify each value pair in a separate line.
Configure the Docker image:
- Variables: Variables required to run the code in
<name>:<value>
format. Specify each value pair in a separate line. - Docker: Docker image parameters:
- Image: Docker image URL.
- User: System account with the password or ID of the secret containing the authorized key.
Configure your Python working environment:
- Environment dependency build method: Select whether to define environment dependencies manually or automatically.
- Extra root paths: Additional root folders. Specify each value in a separate line.
- Extra index urls: Additional index URLs.
- If you choose not to define environment dependencies, enable the No Dependencies option.
To save your current settings to the
launch.json
file for use in future jobs, click Save. To export the job configuration to theconfig.yaml
file, click Export.You can also upload a previously saved and ready-to-use job configuration by clicking Load.
-
To run the configuration, click Invoke.
When you successfully run the job, in the DataSphere Jobs Toolkit extension, the DEBUG CONSOLE window will open. There you can see a link to the job in DataSphere:
creating job ... uploading 37 files (129.7MB) ... files are uploaded created job `bt19qb2pb0ji********` executing job ... job link: https://datasphere.yandex.ru/communities/bt11e3m29qti********/projects/bt1eq06id8kv********/job/bt19qb2pb0ji********
The OUTPUT window also provides tabs with the following information:
- DataSphere Jobs Invocation: User code run results
- DataSphere Jobs Toolkit: Job start, end, or error messages
- DataSphere Jobs Toolkit Logs: Information about the extension performance
View the job history
The DataSphere Jobs Toolkit extension allows you to view the job history and manage your jobs.
To see the DataSphere Jobs job history, select
DataSphere Jobs in the left-hand panel. This will open the DATASPHERE JOBS: LAUNCH HISTORY panel where you will see your jobs sorted by start time.You can perform the following actions from the DataSphere Jobs job history panel:
- Cancel: Stop the job.
- Attach: Connect to the job.
- Copy job ID: Copy the job ID.
- Open job: Open the job in a browser.