Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex Data Processing
  • Getting started
    • All guides
      • All jobs
      • Running jobs
      • Spark jobs
      • PySpark jobs
      • Hive jobs
      • MapReduce jobs
    • Setting up and using Python virtual environments
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials
  • FAQ

In this article:

  • Create a job
  • Cancel a job
  • Get a list of jobs
  • Get general information about the job
  • Get job execution logs
  1. Step-by-step guides
  2. Jobs
  3. Hive jobs

Managing Hive jobs

Written by
Yandex Cloud
Updated at May 5, 2025
  • Create a job
  • Cancel a job
  • Get a list of jobs
  • Get general information about the job
  • Get job execution logs

In a Yandex Data Processing cluster, you can manage jobs and receive execution logs for them. For examples of jobs, see Working with jobs.

Create a jobCreate a job

Management console
CLI
API
  1. Go to the folder page and select Yandex Data Processing.

  2. Click the cluster name and open the Jobs tab.

  3. Click Submit job.

  4. (Optional) Enter a name for the job.

  5. In the Job type field, select Hive.

  6. (Optional) In the Properties field, specify component properties as key-value pairs.

    If an argument, variable, or property is in several space-separated parts, specify each part separately. At the same time, it is important to preserve the order in which you declare arguments, variables, and properties.

    The -mapper mapper.py argument, for instance, must be converted into two arguments, -mapper and mapper.py, in that order.

  7. (Optional) Enable the Continue on failure setting.

  8. Specify Script variables as a key:value pair.

  9. (Optional) Specify the paths to the JAR files, if any.

    File location Path format
    Instance file system file:///<path_to_file>
    Distributed cluster file system hdfs:///<path_to_file>
    Object Storage bucket s3a://<bucket_name>/<path_to_file>
    Internet http://<path_to_file> or https://<path_to_file>

    Archives in standard Linux formats, such as zip, gz, xz, bz2, etc., are supported.

    The cluster service account needs read access to all the files in the bucket. Step-by-step guides on how to set up access to Object Storage are provided in Editing a bucket ACL.

  10. Select one of the driver types and specify which to use to start the job:

    • List of queries to be executed.
    • Path to the file with the queries to be executed.
  11. Click Submit job.

If you do not have the Yandex Cloud CLI yet, install and initialize it.

The folder specified when creating the CLI profile is used by default. To change the default folder, use the yc config set folder-id <folder_ID> command. You can specify a different folder using the --folder-name or --folder-id parameter.

To create a job:

  1. View the description of the CLI create command for Hive jobs:

    yc dataproc job create-hive --help
    
  2. Create a job (the example does not show all the available parameters):

    yc dataproc job create-hive \
       --cluster-name=<cluster_name> \
       --name=<job_name> \
       --query-file-uri=<query_file_URI> \
       --script-variables=<list_of_values>
    

    Where --script-variables is a comma-separated list of variable values.

    Provide the paths to the files required for the job in the following format:

    File location Path format
    Instance file system file:///<path_to_file>
    Distributed cluster file system hdfs:///<path_to_file>
    Object Storage bucket s3a://<bucket_name>/<path_to_file>
    Internet http://<path_to_file> or https://<path_to_file>

    Archives in standard Linux formats, such as zip, gz, xz, bz2, etc., are supported.

    The cluster service account needs read access to all the files in the bucket. Step-by-step guides on how to set up access to Object Storage are provided in Editing a bucket ACL.

You can get the cluster ID and name with the list of clusters in the folder.

Use the create API method and include the following information in the request:

  • Cluster ID in the clusterId parameter. You can get it with the list of clusters in the folder.
  • Job name in the name parameter.
  • Job properties in the hiveJob parameter.

Cancel a jobCancel a job

Note

You cannot cancel jobs with the ERROR, DONE, or CANCELLED status. To find out a job's status, retrieve a list of jobs in the cluster.

Management console
CLI
API
  1. Go to the folder page and select Yandex Data Processing.
  2. Click the cluster name and open the Jobs tab.
  3. Click the job name.
  4. Click Cancel in the top-right corner of the page.
  5. In the window that opens, select Cancel.

If you do not have the Yandex Cloud CLI yet, install and initialize it.

The folder specified when creating the CLI profile is used by default. To change the default folder, use the yc config set folder-id <folder_ID> command. You can specify a different folder using the --folder-name or --folder-id parameter.

To cancel a job, run the command below:

yc dataproc job cancel <job_name_or_ID> \
  --cluster-name=<cluster_name>

You can get the job name or ID with the list of cluster jobs, and the cluster name, with the list of folder clusters.

Use the API cancel method and include the following in the request:

  • Cluster ID in the clusterId parameter.
  • Job ID in the jobId parameter.

You can get the cluster ID with the list of folder clusters, and the job ID, with the list of cluster jobs.

Get a list of jobsGet a list of jobs

Management console
CLI
API
  1. Go to the folder page and select Yandex Data Processing.
  2. Click the cluster name and open the Jobs tab.

If you do not have the Yandex Cloud CLI yet, install and initialize it.

The folder specified when creating the CLI profile is used by default. To change the default folder, use the yc config set folder-id <folder_ID> command. You can specify a different folder using the --folder-name or --folder-id parameter.

To get a list of jobs, run the following command:

yc dataproc job list --cluster-name=<cluster_name>

You can get the cluster ID and name with a list of clusters in the folder.

Use the list API method and provide the cluster ID in the clusterId request parameter.

You can get the cluster ID with a list of clusters in the folder.

Get general information about the jobGet general information about the job

Management console
CLI
API
  1. Go to the folder page and select Yandex Data Processing.
  2. Click the cluster name and open the Jobs tab.
  3. Click the job name.

If you do not have the Yandex Cloud CLI yet, install and initialize it.

The folder specified when creating the CLI profile is used by default. To change the default folder, use the yc config set folder-id <folder_ID> command. You can specify a different folder using the --folder-name or --folder-id parameter.

To get general information about the job, run the command:

yc dataproc job get \
   --cluster-name=<cluster_name> \
   --name=<job_name>

You can get the cluster ID and name with a list of clusters in the folder.

Use the get API method and include the following in the request:

  • Cluster ID in the clusterId parameter. You can get it together with a list of clusters in the folder.
  • Job ID in the jobId parameter. You can get it with the list of cluster jobs.

Get job execution logsGet job execution logs

Note

You can view the job logs and search data in them using Yandex Cloud Logging. For more information, see Working with logs.

Management console
CLI
API
  1. Go to the folder page and select Yandex Data Processing.
  2. Click the cluster name and open the Jobs tab.
  3. Click the job name.

If you do not have the Yandex Cloud CLI yet, install and initialize it.

The folder specified when creating the CLI profile is used by default. To change the default folder, use the yc config set folder-id <folder_ID> command. You can specify a different folder using the --folder-name or --folder-id parameter.

To get job execution logs, run the following command:

yc dataproc job log \
   --cluster-name=<cluster_name> \
   --name=<job_name>

You can get the cluster ID and name with the list of clusters in the folder.

Use the API listLog method and include the following in the request:

  • Cluster ID in the clusterId parameter. You can get it with the list of clusters in the folder.
  • Job ID in the jobId parameter. You can get it with the list of cluster jobs.

Was the article helpful?

Previous
PySpark jobs
Next
MapReduce jobs
© 2025 Direct Cursus Technology L.L.C.