Managing Hive jobs
In a Yandex Data Processing cluster, you can manage jobs and receive execution logs for them. For examples of jobs, see Working with jobs.
Create a job
-
Go to the folder page
and select Yandex Data Processing. -
Click the cluster name and open the Jobs tab.
-
Click Submit job.
-
(Optional) Enter a name for the job.
-
In the Job type field, select
Hive
. -
(Optional) In the Properties field, specify component properties as
key-value
pairs.If an argument, variable, or property is in several space-separated parts, specify each part separately. At the same time, it is important to preserve the order in which you declare arguments, variables, and properties.
The
-mapper mapper.py
argument, for instance, must be converted into two arguments,-mapper
andmapper.py
, in that order. -
(Optional) Enable the Continue on failure setting.
-
Specify Script variables as a
key:value
pair. -
(Optional) Specify the paths to the JAR files, if any.
File location Path format Instance file system file:///<path_to_file>
Distributed cluster file system hdfs:///<path_to_file>
Object Storage bucket s3a://<bucket_name>/<path_to_file>
Internet http://<path_to_file>
orhttps://<path_to_file>
Archives in standard Linux formats, such as
zip
,gz
,xz
,bz2
, etc., are supported.The cluster service account needs read access to all the files in the bucket. Step-by-step guides on how to set up access to Object Storage are provided in Editing a bucket's ACL.
-
Select one of the driver types and specify which to use to start the job:
- List of queries to be executed.
- Path to the file with the queries to be executed.
-
Click Submit job.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To create a job:
-
View the description of the CLI create command for
Hive
jobs:yc dataproc job create-hive --help
-
Create a job (the example does not show all the available parameters):
yc dataproc job create-hive \ --cluster-name=<cluster_name> \ --name=<job_name> \ --query-file-uri=<query_file_URI> \ --script-variables=<list_of_values>
Where
--script-variables
is a comma-separated list of variable values.Provide the paths to the files required for the job in the following format:
File location Path format Instance file system file:///<path_to_file>
Distributed cluster file system hdfs:///<path_to_file>
Object Storage bucket s3a://<bucket_name>/<path_to_file>
Internet http://<path_to_file>
orhttps://<path_to_file>
Archives in standard Linux formats, such as
zip
,gz
,xz
,bz2
, etc., are supported.The cluster service account needs read access to all the files in the bucket. Step-by-step guides on how to set up access to Object Storage are provided in Editing a bucket's ACL.
You can get the cluster ID and name with a list of clusters in the folder.
Use the create API method and include the following information in the request:
- Cluster ID in the
clusterId
parameter. You can get it with a list of clusters in the folder. - Job name in the
name
parameter. - Job properties in the
hiveJob
parameter.
Cancel a job
Note
You cannot cancel jobs with the ERROR
, DONE
, or CANCELLED
status. To find out a job's status, retrieve a list of jobs in the cluster.
- Go to the folder page
and select Yandex Data Processing. - Click the cluster name and open the Jobs tab.
- Click the job name.
- Click Cancel in the top-right corner of the page.
- In the window that opens, select Cancel.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To cancel a job, run the command below:
yc dataproc job cancel <job_name_or_ID> \
--cluster-name=<cluster_name>
You can get the job name or ID with the list of cluster jobs, and the cluster name, with the list of folder clusters.
Use the API cancel method and include the following in the request:
- Cluster ID in the
clusterId
parameter. - Job ID in the
jobId
parameter.
You can get the cluster ID with the list of folder clusters, and the job ID, with the list of cluster jobs.
Get a list of jobs
- Go to the folder page
and select Yandex Data Processing. - Click the cluster name and open the Jobs tab.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To get a list of jobs, run the following command:
yc dataproc job list --cluster-name=<cluster_name>
You can get the cluster ID and name with a list of clusters in the folder.
Use the list API method and provide the cluster ID in the clusterId
request parameter.
You can get the cluster ID with a list of clusters in the folder.
Get general information about the job
- Go to the folder page
and select Yandex Data Processing. - Click the cluster name and open the Jobs tab.
- Click the job name.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To get general information about the job, run the command:
yc dataproc job get \
--cluster-name=<cluster_name> \
--name=<job_name>
You can get the cluster ID and name with a list of clusters in the folder.
Use the get API method and include the following in the request:
- Cluster ID in the
clusterId
parameter. You can get it together with a list of clusters in the folder. - Job ID in the
jobId
parameter. You can get it with the list of cluster jobs.
Get job execution logs
Note
You can view the job logs and search data in them using Yandex Cloud Logging. For more information, see Working with logs.
- Go to the folder page
and select Yandex Data Processing. - Click the cluster name and open the Jobs tab.
- Click the job name.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To get job execution logs, run the following command:
yc dataproc job log \
--cluster-name=<cluster_name> \
--name=<job_name>
You can get the cluster ID and name with a list of clusters in the folder.
Use the API listLog method and include the following in the request:
- Cluster ID in the
clusterId
parameter. You can get it with a list of clusters in the folder. - Job ID in the
jobId
parameter. You can get it with the list of cluster jobs.