Working with jobs from Visual Studio Code
DataSphere Jobs Toolkit enables you to work with DataSphere jobs from within Visual Studio Code
DataSphere Jobs Toolkit uses DataSphere CLI to analyze the environment and identify dependencies. The CLI is automatically installed in the current virtual environment when you run your first job.
Note
To use DataSphere Jobs Toolkit, make sure you have installed Visual Studio Code
When you run a job, the datasphere library analyzes the environment, collects code dependencies, and can provide them to DataSphere for deploying the environment on a cloud VM. To avoid unnecessary system dependencies that can affect job performance, we recommend using a virtual environment, such as venv
Note
To run DataSphere jobs, use Python venv
Install DataSphere Jobs Toolkit
- Open the extension page on Visual Studio Marketplace
. - Click Install.
- The browser will prompt you to open Visual Studio Code. Click Open app.
- On the DataSphere Jobs Toolkit page that opens in Visual Studio Code, click Install.
Once you go through the installation successfully, the extension page will show This extension is enabled globally.
Get authenticated in DataSphere Jobs
To start working with DataSphere Jobs, get authenticated using a Yandex or federated account.
-
Get an OAuth token.
An OAuth token lives 12 months. After that, you need to get a new one
and get authenticated again. -
In Visual Studio Code, click Search at the top of the window and enter this command:
> DataSphere Jobs: Set OAuth token. In the window that opens, click Open to get an OAuth token.If you already have an OAuth token, click Cancel.
-
In the Input OAuth token field that opens, enter your OAuth token.
To delete the OAuth token, use this command:> DataSphere Jobs: Remove OAuth token.
To authenticate a federated or local account, you need to install and configure the CLI. If the CLI is installed in the default folder or the path to it is specified in the Path environment variable, the toolkit will detect it automatically. If not, specify the CLI executable path in the extension settings:
-
In the left-hand panel, select DataSphere Jobs and click Settings.
-
In the Datasphere: Yandex Cloud Console Path field, enter the path to the CLI executable, such as the following:
C:\Users\<username>\yandex-cloud\bin\yc.exe
Run a job
-
In the left-hand panel, select DataSphere Jobs and click Settings.
-
In the Datasphere: Project field, enter the project ID. To get the project ID, click the ID icon on your project page in the DataSphere interface
. -
Open a Python file using DataSphere Jobs Toolkit. To do this, in the Visual Studio Code Explorer, right-click the file you need and select Run File in DataSphere Jobs.
You can also pre-open the file with the job code in Visual Studio Code, click Run and Debug in the top-right corner of the edit window, and select Run File in DataSphere Jobs.
You can use this sample code when testing the extension. For more job run examples, see this GitHub repository
. -
In the DataSphere Job Configuration window that opens, set the following configuration:
BASICADVANCEDENVIRONMENTPYTHONConfigure the main run parameters that match the values in the
config.yamlfile:- Working directory: Working directory containing the files required to run your job.
- Inputs: Files with input data in
<path>=<variable_name>format. Specify each value pair on a separate line. - Outputs: Files with output data in
<path>=<variable_name>format. Specify each value pair on a separate line.
Configure additional settings:
- Project Identifier: DataSphere project ID.
- Configuration file path: Path to the completed
config.yamlconfiguration file. - Instance Type: Configuration settings of the VM you will use to run your job.
- S3 mounts: IDs of S3 connectors. In case of multiple S3 connectors, specify each ID on a separate line.
- Datasets: Datasets in
<dataset>=<variable_name>format. Specify each value pair on a separate line.
Configure the Docker image:
- Variables: Variables required to run the code, in
<name>:<value>format. Specify each value pair on a separate line. - Docker: Docker image settings:
- Image: Docker image URL.
- User: System account with either a password or ID of the secret that holds the authorized key.
Set up your Python working environment:
- Environment dependency build method: Select whether to define environment dependencies manually or automatically.
- Extra root paths: Additional root folders. Specify each value on a separate line.
- Extra index urls: Additional index URLs.
- If you choose not to define environment dependencies, enable No Dependencies.
To save your current settings to
launch.jsonfor use in future jobs, click Save. To export job configuration toconfig.yaml, click Export.You can also upload a previously saved and ready-to-use job configuration by clicking Load.
-
To run the configuration, click Invoke.
After you successfully run the job from within DataSphere Jobs Toolkit, the DEBUG CONSOLE window will open. It will contain a link to the job in DataSphere:
creating job ... uploading 37 files (129.7MB) ... files are uploaded created job `bt19qb2pb0ji********` executing job ... job link: https://datasphere.yandex.ru/communities/bt11e3m29qti********/projects/bt1eq06id8kv********/job/bt19qb2pb0ji********The OUTPUT window also features tabs with the following information:
- DataSphere Jobs Invocation: User code execution results.
- DataSphere Jobs Toolkit: Job start, end, or error messages.
- DataSphere Jobs Toolkit Logs: Information about the extension performance.
View the job history
DataSphere Jobs Toolkit allows you to view the job history and manage your jobs.
To see the job history in DataSphere Jobs, select DataSphere Jobs in the left-hand panel. This will open the DATASPHERE JOBS: LAUNCH HISTORY panel with jobs sorted by start time.
You can perform the following actions from the DataSphere Jobs history panel:
- Cancel: Stop the job.
- Attach: Connect to the job.
- Copy job ID: Copy the job ID.
- Open job: Open the job in a browser.