Managing Hive jobs

Written by

Updated at June 9, 2025

Create a job
Cancel a job
Get a list of jobs
Get general information about the job
Get job execution logs

In a Yandex Data Processing cluster, you can manage jobs and receive execution logs for them. For examples of jobs, see Working with jobs.

Create a job

Management console

CLI

API

Go to the folder page and select Yandex Data Processing.
Click the cluster name and open the Jobs tab.
Click Submit job.
(Optional) Enter a name for the job.
In the Job type field, select Hive.
(Optional) In the Properties field, specify component properties as key-value pairs.

If an argument, variable, or property is in several space-separated parts, specify each part separately. At the same time, it is important to preserve the order in which you declare arguments, variables, and properties.

The -mapper mapper.py argument, for instance, must be converted into two arguments, -mapper and mapper.py, in that order.
(Optional) Enable the Continue on failure setting.
Specify Script variables as a key:value pair.

(Optional) Specify the paths to the JAR files, if any.

File location	Path format
Instance file system	`file:///<path_to_file>`
Distributed cluster file system	`hdfs:///<path_to_file>`
Object Storage bucket	`s3a://<bucket_name>/<path_to_file>`
Internet	`http://<path_to_file>` or `https://<path_to_file>`

Archives in standard Linux formats, such as zip, gz, xz, bz2, etc., are supported.

The cluster service account needs read access to all the files in the bucket. Step-by-step guides on how to set up access to Object Storage are provided in Editing a bucket ACL.

Select one of the driver types and specify which to use to start the job:
- List of queries to be executed.
- Path to the file with the queries to be executed.
Click Submit job.

If you do not have the Yandex Cloud CLI installed yet, install and initialize it.

By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.

To create a job:

View the description of the CLI create command for Hive jobs:
```
yc dataproc job create-hive --help
```

Create a job (the example does not show all the available parameters):

yc dataproc job create-hive \
   --cluster-name=<cluster_name> \
   --name=<job_name> \
   --query-file-uri=<query_file_URI> \
   --script-variables=<list_of_values>

Where --script-variables is a comma-separated list of variable values.

Provide the paths to the files required for the job in the following format:

File location	Path format
Instance file system	`file:///<path_to_file>`
Distributed cluster file system	`hdfs:///<path_to_file>`
Object Storage bucket	`s3a://<bucket_name>/<path_to_file>`
Internet	`http://<path_to_file>` or `https://<path_to_file>`

Archives in standard Linux formats, such as zip, gz, xz, bz2, etc., are supported.

The cluster service account needs read access to all the files in the bucket. Step-by-step guides on how to set up access to Object Storage are provided in Editing a bucket ACL.

You can get the cluster ID and name with the list of clusters in the folder.

Use the create API method and include the following information in the request:

Cluster ID in the clusterId parameter. You can get it with the list of clusters in the folder.
Job name in the name parameter.
Job properties in the hiveJob parameter.

Cancel a job

Note

You cannot cancel jobs with the ERROR, DONE, or CANCELLED status. To find out a job's status, retrieve a list of jobs in the cluster.

Management console

CLI

API

Go to the folder page and select Yandex Data Processing.
Click the cluster name and open the Jobs tab.
Click the job name.
Click Cancel in the top-right corner of the page.
In the window that opens, select Cancel.

If you do not have the Yandex Cloud CLI installed yet, install and initialize it.

To cancel a job, run the command below:

yc dataproc job cancel <job_name_or_ID> \
  --cluster-name=<cluster_name>

You can get the job name or ID with the list of cluster jobs, and the cluster name, with the list of folder clusters.

Use the API cancel method and include the following in the request:

Cluster ID in the clusterId parameter.
Job ID in the jobId parameter.

You can get the cluster ID with the list of folder clusters, and the job ID, with the list of cluster jobs.

Get a list of jobs

Management console

CLI

API

Go to the folder page and select Yandex Data Processing.
Click the cluster name and open the Jobs tab.

If you do not have the Yandex Cloud CLI installed yet, install and initialize it.

To get a list of jobs, run the following command:

yc dataproc job list --cluster-name=<cluster_name>

You can get the cluster ID and name with a list of clusters in the folder.

Use the list API method and provide the cluster ID in the clusterId request parameter.

You can get the cluster ID with a list of clusters in the folder.

Get general information about the job

Management console

CLI

API

Go to the folder page and select Yandex Data Processing.
Click the cluster name and open the Jobs tab.
Click the job name.

If you do not have the Yandex Cloud CLI installed yet, install and initialize it.

To get general information about the job, run the command:

yc dataproc job get \
   --cluster-name=<cluster_name> \
   --name=<job_name>

You can get the cluster ID and name with a list of clusters in the folder.

Use the get API method and include the following in the request:

Cluster ID in the clusterId parameter. You can get it together with a list of clusters in the folder.
Job ID in the jobId parameter. You can get it with the list of cluster jobs.

Get job execution logs

Note

You can view the job logs and search data in them using Yandex Cloud Logging. For more information, see Working with logs.

Management console

CLI

API

Go to the folder page and select Yandex Data Processing.
Click the cluster name and open the Jobs tab.
Click the job name.

If you do not have the Yandex Cloud CLI installed yet, install and initialize it.

To get job execution logs, run the following command:

yc dataproc job log \
   --cluster-name=<cluster_name> \
   --name=<job_name>

You can get the cluster ID and name with the list of clusters in the folder.

Use the API listLog method and include the following in the request:

Cluster ID in the clusterId parameter. You can get it with the list of clusters in the folder.
Job ID in the jobId parameter. You can get it with the list of cluster jobs.

Managing Hive jobs

Create a jobCreate a job

Cancel a jobCancel a job

Get a list of jobsGet a list of jobs

Get general information about the jobGet general information about the job

Get job execution logsGet job execution logs

Was the article helpful?

Create a job

Cancel a job

Get a list of jobs

Get general information about the job

Get job execution logs