Data Proc API, gRPC: JobService
Written by
Updated at December 13, 2022
A set of methods for managing Data Proc jobs.
Call | Description |
---|---|
List | Retrieves a list of jobs for a cluster. |
Create | Creates a job for a cluster. |
Get | Returns the specified job. |
ListLog | Returns a log for specified job. |
Cancel | Cancels the specified Dataproc job. |
Calls JobService
List
Retrieves a list of jobs for a cluster.
rpc List (ListJobsRequest) returns (ListJobsResponse)
ListJobsRequest
Field | Description |
---|---|
cluster_id | string Required. ID of the cluster to list jobs for. The maximum string length in characters is 50. |
page_size | int64 The maximum number of results per page to return. If the number of available results is larger than page_size , the service returns a ListJobsResponse.next_page_token that can be used to get the next page of results in subsequent list requests. Default value: 100. The maximum value is 1000. |
page_token | string Page token. To get the next page of results, set page_token to the ListJobsResponse.next_page_token returned by a previous list request. The maximum string length in characters is 100. |
filter | string A filter expression that filters jobs listed in the response. The expression must specify:
name=my-job . The maximum string length in characters is 1000. |
ListJobsResponse
Field | Description |
---|---|
jobs[] | Job List of jobs for the specified cluster. |
next_page_token | string Token for getting the next page of the list. If the number of results is greater than the specified ListJobsRequest.page_size, use next_page_token as the value for the ListJobsRequest.page_token parameter in the next list request. Each subsequent page will have its own next_page_token to continue paging through the results. |
Job
Field | Description |
---|---|
id | string ID of the job. Generated at creation time. |
cluster_id | string ID of the Data Proc cluster that the job belongs to. |
created_at | google.protobuf.Timestamp Creation timestamp. |
started_at | google.protobuf.Timestamp The time when the job was started. |
finished_at | google.protobuf.Timestamp The time when the job was finished. |
name | string Name of the job, specified in the JobService.Create request. |
created_by | string The id of the user who created the job |
status | enum Status Job status.
|
job_spec | oneof: mapreduce_job , spark_job , pyspark_job or hive_job Specification for the job. |
mapreduce_job | MapreduceJob Specification for a MapReduce job. |
spark_job | SparkJob Specification for a Spark job. |
pyspark_job | PysparkJob Specification for a PySpark job. |
hive_job | HiveJob Specification for a Hive job. |
application_info | ApplicationInfo Attributes of YARN application. |
MapreduceJob
Field | Description |
---|---|
args[] | string Optional arguments to pass to the driver. |
jar_file_uris[] | string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
file_uris[] | string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks. |
archive_uris[] | string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties | map<string,string> Property names and values, used to configure Data Proc and MapReduce. |
driver | oneof: main_jar_file_uri or main_class |
main_jar_file_uri | string HCFS URI of the .jar file containing the driver class. |
main_class | string The name of the driver class. |
SparkJob
Field | Description |
---|---|
args[] | string Optional arguments to pass to the driver. |
jar_file_uris[] | string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
file_uris[] | string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks. |
archive_uris[] | string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties | map<string,string> Property names and values, used to configure Data Proc and Spark. |
main_jar_file_uri | string The HCFS URI of the JAR file containing the main class for the job. |
main_class | string The name of the driver class. |
packages[] | string List of maven coordinates of jars to include on the driver and executor classpaths. |
repositories[] | string List of additional remote repositories to search for the maven coordinates given with --packages. |
exclude_packages[] | string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts. |
PysparkJob
Field | Description |
---|---|
args[] | string Optional arguments to pass to the driver. |
jar_file_uris[] | string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
file_uris[] | string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks. |
archive_uris[] | string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties | map<string,string> Property names and values, used to configure Data Proc and PySpark. |
main_python_file_uri | string URI of the file with the driver code. Must be a .py file. |
python_file_uris[] | string URIs of Python files to pass to the PySpark framework. |
packages[] | string List of maven coordinates of jars to include on the driver and executor classpaths. |
repositories[] | string List of additional remote repositories to search for the maven coordinates given with --packages. |
exclude_packages[] | string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts. |
HiveJob
Field | Description |
---|---|
properties | map<string,string> Property names and values, used to configure Data Proc and Hive. |
continue_on_failure | bool Flag indicating whether a job should continue to run if a query fails. |
script_variables | map<string,string> Query variables and their values. |
jar_file_uris[] | string JAR file URIs to add to CLASSPATH of the Hive driver and each task. |
query_type | oneof: query_file_uri or query_list |
query_file_uri | string URI of the script with all the necessary Hive queries. |
query_list | QueryList List of Hive queries to be used in the job. |
QueryList
Field | Description |
---|---|
queries[] | string List of Hive queries. |
ApplicationInfo
Field | Description |
---|---|
id | string ID of YARN application |
application_attempts[] | ApplicationAttempt YARN application attempts |
ApplicationAttempt
Field | Description |
---|---|
id | string ID of YARN application attempt |
am_container_id | string ID of YARN Application Master container |
Create
Creates a job for a cluster.
rpc Create (CreateJobRequest) returns (operation.Operation)
Metadata and response of Operation:
Operation.metadata:CreateJobMetadata
Operation.response:Job
CreateJobRequest
Field | Description |
---|---|
cluster_id | string Required. ID of the cluster to create a job for. The maximum string length in characters is 50. |
name | string Name of the job. Value must match the regular expression |[a-z][-a-z0-9]{1,61}[a-z0-9] . |
job_spec | oneof: mapreduce_job , spark_job , pyspark_job or hive_job Specification for the job. |
mapreduce_job | MapreduceJob Specification for a MapReduce job. |
spark_job | SparkJob Specification for a Spark job. |
pyspark_job | PysparkJob Specification for a PySpark job. |
hive_job | HiveJob Specification for a Hive job. |
MapreduceJob
Field | Description |
---|---|
args[] | string Optional arguments to pass to the driver. |
jar_file_uris[] | string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
file_uris[] | string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks. |
archive_uris[] | string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties | map<string,string> Property names and values, used to configure Data Proc and MapReduce. |
driver | oneof: main_jar_file_uri or main_class |
main_jar_file_uri | string HCFS URI of the .jar file containing the driver class. |
main_class | string The name of the driver class. |
SparkJob
Field | Description |
---|---|
args[] | string Optional arguments to pass to the driver. |
jar_file_uris[] | string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
file_uris[] | string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks. |
archive_uris[] | string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties | map<string,string> Property names and values, used to configure Data Proc and Spark. |
main_jar_file_uri | string The HCFS URI of the JAR file containing the main class for the job. |
main_class | string The name of the driver class. |
packages[] | string List of maven coordinates of jars to include on the driver and executor classpaths. |
repositories[] | string List of additional remote repositories to search for the maven coordinates given with --packages. |
exclude_packages[] | string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts. |
PysparkJob
Field | Description |
---|---|
args[] | string Optional arguments to pass to the driver. |
jar_file_uris[] | string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
file_uris[] | string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks. |
archive_uris[] | string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties | map<string,string> Property names and values, used to configure Data Proc and PySpark. |
main_python_file_uri | string URI of the file with the driver code. Must be a .py file. |
python_file_uris[] | string URIs of Python files to pass to the PySpark framework. |
packages[] | string List of maven coordinates of jars to include on the driver and executor classpaths. |
repositories[] | string List of additional remote repositories to search for the maven coordinates given with --packages. |
exclude_packages[] | string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts. |
HiveJob
Field | Description |
---|---|
properties | map<string,string> Property names and values, used to configure Data Proc and Hive. |
continue_on_failure | bool Flag indicating whether a job should continue to run if a query fails. |
script_variables | map<string,string> Query variables and their values. |
jar_file_uris[] | string JAR file URIs to add to CLASSPATH of the Hive driver and each task. |
query_type | oneof: query_file_uri or query_list |
query_file_uri | string URI of the script with all the necessary Hive queries. |
query_list | QueryList List of Hive queries to be used in the job. |
QueryList
Field | Description |
---|---|
queries[] | string List of Hive queries. |
Operation
Field | Description |
---|---|
id | string ID of the operation. |
description | string Description of the operation. 0-256 characters long. |
created_at | google.protobuf.Timestamp Creation timestamp. |
created_by | string ID of the user or service account who initiated the operation. |
modified_at | google.protobuf.Timestamp The time when the Operation resource was last modified. |
done | bool If the value is false , it means the operation is still in progress. If true , the operation is completed, and either error or response is available. |
metadata | google.protobuf.Any Service-specific metadata associated with the operation. It typically contains the ID of the target resource that the operation is performed on. Any method that returns a long-running operation should document the metadata type, if any. |
result | oneof: error or response The operation result. If done == false and there was no failure detected, neither error nor response is set. If done == false and there was a failure detected, error is set. If done == true , exactly one of error or response is set. |
error | google.rpc.Status The error result of the operation in case of failure or cancellation. |
response | google.protobuf.Any if operation finished successfully. |
CreateJobMetadata
Field | Description |
---|---|
cluster_id | string Required. ID of the cluster that the job is being created for. The maximum string length in characters is 50. |
job_id | string ID of the job being created. The maximum string length in characters is 50. |
Job
Field | Description |
---|---|
id | string ID of the job. Generated at creation time. |
cluster_id | string ID of the Data Proc cluster that the job belongs to. |
created_at | google.protobuf.Timestamp Creation timestamp. |
started_at | google.protobuf.Timestamp The time when the job was started. |
finished_at | google.protobuf.Timestamp The time when the job was finished. |
name | string Name of the job, specified in the JobService.Create request. |
created_by | string The id of the user who created the job |
status | enum Status Job status.
|
job_spec | oneof: mapreduce_job , spark_job , pyspark_job or hive_job Specification for the job. |
mapreduce_job | MapreduceJob Specification for a MapReduce job. |
spark_job | SparkJob Specification for a Spark job. |
pyspark_job | PysparkJob Specification for a PySpark job. |
hive_job | HiveJob Specification for a Hive job. |
application_info | ApplicationInfo Attributes of YARN application. |
ApplicationInfo
Field | Description |
---|---|
id | string ID of YARN application |
application_attempts[] | ApplicationAttempt YARN application attempts |
ApplicationAttempt
Field | Description |
---|---|
id | string ID of YARN application attempt |
am_container_id | string ID of YARN Application Master container |
Get
Returns the specified job.
rpc Get (GetJobRequest) returns (Job)
GetJobRequest
Field | Description |
---|---|
cluster_id | string Required. ID of the cluster to request a job from. The maximum string length in characters is 50. |
job_id | string Required. ID of the job to return. To get a job ID make a JobService.List request. The maximum string length in characters is 50. |
Job
Field | Description |
---|---|
id | string ID of the job. Generated at creation time. |
cluster_id | string ID of the Data Proc cluster that the job belongs to. |
created_at | google.protobuf.Timestamp Creation timestamp. |
started_at | google.protobuf.Timestamp The time when the job was started. |
finished_at | google.protobuf.Timestamp The time when the job was finished. |
name | string Name of the job, specified in the JobService.Create request. |
created_by | string The id of the user who created the job |
status | enum Status Job status.
|
job_spec | oneof: mapreduce_job , spark_job , pyspark_job or hive_job Specification for the job. |
mapreduce_job | MapreduceJob Specification for a MapReduce job. |
spark_job | SparkJob Specification for a Spark job. |
pyspark_job | PysparkJob Specification for a PySpark job. |
hive_job | HiveJob Specification for a Hive job. |
application_info | ApplicationInfo Attributes of YARN application. |
MapreduceJob
Field | Description |
---|---|
args[] | string Optional arguments to pass to the driver. |
jar_file_uris[] | string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
file_uris[] | string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks. |
archive_uris[] | string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties | map<string,string> Property names and values, used to configure Data Proc and MapReduce. |
driver | oneof: main_jar_file_uri or main_class |
main_jar_file_uri | string HCFS URI of the .jar file containing the driver class. |
main_class | string The name of the driver class. |
SparkJob
Field | Description |
---|---|
args[] | string Optional arguments to pass to the driver. |
jar_file_uris[] | string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
file_uris[] | string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks. |
archive_uris[] | string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties | map<string,string> Property names and values, used to configure Data Proc and Spark. |
main_jar_file_uri | string The HCFS URI of the JAR file containing the main class for the job. |
main_class | string The name of the driver class. |
packages[] | string List of maven coordinates of jars to include on the driver and executor classpaths. |
repositories[] | string List of additional remote repositories to search for the maven coordinates given with --packages. |
exclude_packages[] | string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts. |
PysparkJob
Field | Description |
---|---|
args[] | string Optional arguments to pass to the driver. |
jar_file_uris[] | string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
file_uris[] | string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks. |
archive_uris[] | string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties | map<string,string> Property names and values, used to configure Data Proc and PySpark. |
main_python_file_uri | string URI of the file with the driver code. Must be a .py file. |
python_file_uris[] | string URIs of Python files to pass to the PySpark framework. |
packages[] | string List of maven coordinates of jars to include on the driver and executor classpaths. |
repositories[] | string List of additional remote repositories to search for the maven coordinates given with --packages. |
exclude_packages[] | string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts. |
HiveJob
Field | Description |
---|---|
properties | map<string,string> Property names and values, used to configure Data Proc and Hive. |
continue_on_failure | bool Flag indicating whether a job should continue to run if a query fails. |
script_variables | map<string,string> Query variables and their values. |
jar_file_uris[] | string JAR file URIs to add to CLASSPATH of the Hive driver and each task. |
query_type | oneof: query_file_uri or query_list |
query_file_uri | string URI of the script with all the necessary Hive queries. |
query_list | QueryList List of Hive queries to be used in the job. |
QueryList
Field | Description |
---|---|
queries[] | string List of Hive queries. |
ApplicationInfo
Field | Description |
---|---|
id | string ID of YARN application |
application_attempts[] | ApplicationAttempt YARN application attempts |
ApplicationAttempt
Field | Description |
---|---|
id | string ID of YARN application attempt |
am_container_id | string ID of YARN Application Master container |
ListLog
Returns a log for specified job.
rpc ListLog (ListJobLogRequest) returns (ListJobLogResponse)
ListJobLogRequest
Field | Description |
---|---|
cluster_id | string Required. ID of the cluster that the job is being created for. The maximum string length in characters is 50. |
job_id | string ID of the job being created. The maximum string length in characters is 50. |
page_size | int64 The maximum bytes of job log per response to return. If the number of available bytes is larger than page_size , the service returns a ListJobLogResponse.next_page_token that can be used to get the next page of output in subsequent list requests. Default value: 1048576. The maximum value is 1048576. |
page_token | string Page token. To get the next page of results, set page_token to the ListJobLogResponse.next_page_token returned by a previous list request. The maximum string length in characters is 100. |
ListJobLogResponse
Field | Description |
---|---|
content | string Requested part of Data Proc Job log. |
next_page_token | string This token allows you to get the next page of results for ListLog requests, if the number of results is larger than page_size specified in the request. To get the next page, specify the value of next_page_token as a value for the page_token parameter in the next ListLog request. Subsequent ListLog requests will have their own next_page_token to continue paging through the results. |
Cancel
Cancels the specified Dataproc job.
rpc Cancel (CancelJobRequest) returns (operation.Operation)
Metadata and response of Operation:
Operation.metadata:CreateJobMetadata
Operation.response:Job
CancelJobRequest
Field | Description |
---|---|
cluster_id | string Required. Required. ID of the Dataproc cluster. The maximum string length in characters is 50. |
job_id | string Required. Required. ID of the Dataproc job to cancel. The maximum string length in characters is 50. |
Operation
Field | Description |
---|---|
id | string ID of the operation. |
description | string Description of the operation. 0-256 characters long. |
created_at | google.protobuf.Timestamp Creation timestamp. |
created_by | string ID of the user or service account who initiated the operation. |
modified_at | google.protobuf.Timestamp The time when the Operation resource was last modified. |
done | bool If the value is false , it means the operation is still in progress. If true , the operation is completed, and either error or response is available. |
metadata | google.protobuf.Any Service-specific metadata associated with the operation. It typically contains the ID of the target resource that the operation is performed on. Any method that returns a long-running operation should document the metadata type, if any. |
result | oneof: error or response The operation result. If done == false and there was no failure detected, neither error nor response is set. If done == false and there was a failure detected, error is set. If done == true , exactly one of error or response is set. |
error | google.rpc.Status The error result of the operation in case of failure or cancellation. |
response | google.protobuf.Any if operation finished successfully. |
CreateJobMetadata
Field | Description |
---|---|
cluster_id | string Required. ID of the cluster that the job is being created for. The maximum string length in characters is 50. |
job_id | string ID of the job being created. The maximum string length in characters is 50. |
Job
Field | Description |
---|---|
id | string ID of the job. Generated at creation time. |
cluster_id | string ID of the Data Proc cluster that the job belongs to. |
created_at | google.protobuf.Timestamp Creation timestamp. |
started_at | google.protobuf.Timestamp The time when the job was started. |
finished_at | google.protobuf.Timestamp The time when the job was finished. |
name | string Name of the job, specified in the JobService.Create request. |
created_by | string The id of the user who created the job |
status | enum Status Job status.
|
job_spec | oneof: mapreduce_job , spark_job , pyspark_job or hive_job Specification for the job. |
mapreduce_job | MapreduceJob Specification for a MapReduce job. |
spark_job | SparkJob Specification for a Spark job. |
pyspark_job | PysparkJob Specification for a PySpark job. |
hive_job | HiveJob Specification for a Hive job. |
application_info | ApplicationInfo Attributes of YARN application. |
MapreduceJob
Field | Description |
---|---|
args[] | string Optional arguments to pass to the driver. |
jar_file_uris[] | string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
file_uris[] | string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks. |
archive_uris[] | string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties | map<string,string> Property names and values, used to configure Data Proc and MapReduce. |
driver | oneof: main_jar_file_uri or main_class |
main_jar_file_uri | string HCFS URI of the .jar file containing the driver class. |
main_class | string The name of the driver class. |
SparkJob
Field | Description |
---|---|
args[] | string Optional arguments to pass to the driver. |
jar_file_uris[] | string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
file_uris[] | string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks. |
archive_uris[] | string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties | map<string,string> Property names and values, used to configure Data Proc and Spark. |
main_jar_file_uri | string The HCFS URI of the JAR file containing the main class for the job. |
main_class | string The name of the driver class. |
packages[] | string List of maven coordinates of jars to include on the driver and executor classpaths. |
repositories[] | string List of additional remote repositories to search for the maven coordinates given with --packages. |
exclude_packages[] | string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts. |
PysparkJob
Field | Description |
---|---|
args[] | string Optional arguments to pass to the driver. |
jar_file_uris[] | string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
file_uris[] | string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks. |
archive_uris[] | string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties | map<string,string> Property names and values, used to configure Data Proc and PySpark. |
main_python_file_uri | string URI of the file with the driver code. Must be a .py file. |
python_file_uris[] | string URIs of Python files to pass to the PySpark framework. |
packages[] | string List of maven coordinates of jars to include on the driver and executor classpaths. |
repositories[] | string List of additional remote repositories to search for the maven coordinates given with --packages. |
exclude_packages[] | string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts. |
HiveJob
Field | Description |
---|---|
properties | map<string,string> Property names and values, used to configure Data Proc and Hive. |
continue_on_failure | bool Flag indicating whether a job should continue to run if a query fails. |
script_variables | map<string,string> Query variables and their values. |
jar_file_uris[] | string JAR file URIs to add to CLASSPATH of the Hive driver and each task. |
query_type | oneof: query_file_uri or query_list |
query_file_uri | string URI of the script with all the necessary Hive queries. |
query_list | QueryList List of Hive queries to be used in the job. |
QueryList
Field | Description |
---|---|
queries[] | string List of Hive queries. |
ApplicationInfo
Field | Description |
---|---|
id | string ID of YARN application |
application_attempts[] | ApplicationAttempt YARN application attempts |
ApplicationAttempt
Field | Description |
---|---|
id | string ID of YARN application attempt |
am_container_id | string ID of YARN Application Master container |