Data Proc API, gRPC: JobService.Create
Creates a job for a cluster.
gRPC request
rpc Create (CreateJobRequest) returns (operation.Operation)
CreateJobRequest
{
"clusterId": "string",
"name": "string",
// Includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`
"mapreduceJob": {
"args": [
"string"
],
"jarFileUris": [
"string"
],
"fileUris": [
"string"
],
"archiveUris": [
"string"
],
"properties": "string",
// Includes only one of the fields `mainJarFileUri`, `mainClass`
"mainJarFileUri": "string",
"mainClass": "string"
// end of the list of possible fields
},
"sparkJob": {
"args": [
"string"
],
"jarFileUris": [
"string"
],
"fileUris": [
"string"
],
"archiveUris": [
"string"
],
"properties": "string",
"mainJarFileUri": "string",
"mainClass": "string",
"packages": [
"string"
],
"repositories": [
"string"
],
"excludePackages": [
"string"
]
},
"pysparkJob": {
"args": [
"string"
],
"jarFileUris": [
"string"
],
"fileUris": [
"string"
],
"archiveUris": [
"string"
],
"properties": "string",
"mainPythonFileUri": "string",
"pythonFileUris": [
"string"
],
"packages": [
"string"
],
"repositories": [
"string"
],
"excludePackages": [
"string"
]
},
"hiveJob": {
"properties": "string",
"continueOnFailure": "bool",
"scriptVariables": "string",
"jarFileUris": [
"string"
],
// Includes only one of the fields `queryFileUri`, `queryList`
"queryFileUri": "string",
"queryList": {
"queries": [
"string"
]
}
// end of the list of possible fields
}
// end of the list of possible fields
}
Field |
Description |
clusterId |
string Required field. ID of the cluster to create a job for. |
name |
string Name of the job. |
mapreduceJob |
Specification for a MapReduce job. Includes only one of the fields Specification for the job. |
sparkJob |
Specification for a Spark job. Includes only one of the fields Specification for the job. |
pysparkJob |
Specification for a PySpark job. Includes only one of the fields Specification for the job. |
hiveJob |
Specification for a Hive job. Includes only one of the fields Specification for the job. |
MapreduceJob
Field |
Description |
args[] |
string Optional arguments to pass to the driver. |
jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
fileUris[] |
string URIs of resource files to be copied to the working directory of Data Proc drivers |
archiveUris[] |
string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties |
string Property names and values, used to configure Data Proc and MapReduce. |
mainJarFileUri |
string HCFS URI of the .jar file containing the driver class. Includes only one of the fields |
mainClass |
string The name of the driver class. Includes only one of the fields |
SparkJob
Field |
Description |
args[] |
string Optional arguments to pass to the driver. |
jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
fileUris[] |
string URIs of resource files to be copied to the working directory of Data Proc drivers |
archiveUris[] |
string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties |
string Property names and values, used to configure Data Proc and Spark. |
mainJarFileUri |
string The HCFS URI of the JAR file containing the |
mainClass |
string The name of the driver class. |
packages[] |
string List of maven coordinates of jars to include on the driver and executor classpaths. |
repositories[] |
string List of additional remote repositories to search for the maven coordinates given with --packages. |
excludePackages[] |
string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts. |
PysparkJob
Field |
Description |
args[] |
string Optional arguments to pass to the driver. |
jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
fileUris[] |
string URIs of resource files to be copied to the working directory of Data Proc drivers |
archiveUris[] |
string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties |
string Property names and values, used to configure Data Proc and PySpark. |
mainPythonFileUri |
string URI of the file with the driver code. Must be a .py file. |
pythonFileUris[] |
string URIs of Python files to pass to the PySpark framework. |
packages[] |
string List of maven coordinates of jars to include on the driver and executor classpaths. |
repositories[] |
string List of additional remote repositories to search for the maven coordinates given with --packages. |
excludePackages[] |
string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts. |
HiveJob
Field |
Description |
properties |
string Property names and values, used to configure Data Proc and Hive. |
continueOnFailure |
bool Flag indicating whether a job should continue to run if a query fails. |
scriptVariables |
string Query variables and their values. |
jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Hive driver and each task. |
queryFileUri |
string URI of the script with all the necessary Hive queries. Includes only one of the fields |
queryList |
List of Hive queries to be used in the job. Includes only one of the fields |
QueryList
Field |
Description |
queries[] |
string List of Hive queries. |
operation.Operation
{
"id": "string",
"description": "string",
"createdAt": "google.protobuf.Timestamp",
"createdBy": "string",
"modifiedAt": "google.protobuf.Timestamp",
"done": "bool",
"metadata": {
"clusterId": "string",
"jobId": "string"
},
// Includes only one of the fields `error`, `response`
"error": "google.rpc.Status",
"response": {
"id": "string",
"clusterId": "string",
"createdAt": "google.protobuf.Timestamp",
"startedAt": "google.protobuf.Timestamp",
"finishedAt": "google.protobuf.Timestamp",
"name": "string",
"createdBy": "string",
"status": "Status",
// Includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`
"mapreduceJob": {
"args": [
"string"
],
"jarFileUris": [
"string"
],
"fileUris": [
"string"
],
"archiveUris": [
"string"
],
"properties": "string",
// Includes only one of the fields `mainJarFileUri`, `mainClass`
"mainJarFileUri": "string",
"mainClass": "string"
// end of the list of possible fields
},
"sparkJob": {
"args": [
"string"
],
"jarFileUris": [
"string"
],
"fileUris": [
"string"
],
"archiveUris": [
"string"
],
"properties": "string",
"mainJarFileUri": "string",
"mainClass": "string",
"packages": [
"string"
],
"repositories": [
"string"
],
"excludePackages": [
"string"
]
},
"pysparkJob": {
"args": [
"string"
],
"jarFileUris": [
"string"
],
"fileUris": [
"string"
],
"archiveUris": [
"string"
],
"properties": "string",
"mainPythonFileUri": "string",
"pythonFileUris": [
"string"
],
"packages": [
"string"
],
"repositories": [
"string"
],
"excludePackages": [
"string"
]
},
"hiveJob": {
"properties": "string",
"continueOnFailure": "bool",
"scriptVariables": "string",
"jarFileUris": [
"string"
],
// Includes only one of the fields `queryFileUri`, `queryList`
"queryFileUri": "string",
"queryList": {
"queries": [
"string"
]
}
// end of the list of possible fields
},
// end of the list of possible fields
"applicationInfo": {
"id": "string",
"applicationAttempts": [
{
"id": "string",
"amContainerId": "string"
}
]
}
}
// end of the list of possible fields
}
An Operation resource. For more information, see Operation.
Field |
Description |
id |
string ID of the operation. |
description |
string Description of the operation. 0-256 characters long. |
createdAt |
Creation timestamp. |
createdBy |
string ID of the user or service account who initiated the operation. |
modifiedAt |
The time when the Operation resource was last modified. |
done |
bool If the value is |
metadata |
Service-specific metadata associated with the operation. |
error |
The error result of the operation in case of failure or cancellation. Includes only one of the fields The operation result. |
response |
The normal response of the operation in case of success. Includes only one of the fields The operation result. |
CreateJobMetadata
Field |
Description |
clusterId |
string Required field. ID of the cluster that the job is being created for. |
jobId |
string ID of the job being created. |
Job
A Data Proc job. For details about the concept, see documentation.
Field |
Description |
id |
string ID of the job. Generated at creation time. |
clusterId |
string ID of the Data Proc cluster that the job belongs to. |
createdAt |
Creation timestamp. |
startedAt |
The time when the job was started. |
finishedAt |
The time when the job was finished. |
name |
string Name of the job, specified in the JobService.Create request. |
createdBy |
string The id of the user who created the job |
status |
enum Status Job status.
|
mapreduceJob |
Specification for a MapReduce job. Includes only one of the fields Specification for the job. |
sparkJob |
Specification for a Spark job. Includes only one of the fields Specification for the job. |
pysparkJob |
Specification for a PySpark job. Includes only one of the fields Specification for the job. |
hiveJob |
Specification for a Hive job. Includes only one of the fields Specification for the job. |
applicationInfo |
Attributes of YARN application. |
MapreduceJob
Field |
Description |
args[] |
string Optional arguments to pass to the driver. |
jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
fileUris[] |
string URIs of resource files to be copied to the working directory of Data Proc drivers |
archiveUris[] |
string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties |
string Property names and values, used to configure Data Proc and MapReduce. |
mainJarFileUri |
string HCFS URI of the .jar file containing the driver class. Includes only one of the fields |
mainClass |
string The name of the driver class. Includes only one of the fields |
SparkJob
Field |
Description |
args[] |
string Optional arguments to pass to the driver. |
jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
fileUris[] |
string URIs of resource files to be copied to the working directory of Data Proc drivers |
archiveUris[] |
string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties |
string Property names and values, used to configure Data Proc and Spark. |
mainJarFileUri |
string The HCFS URI of the JAR file containing the |
mainClass |
string The name of the driver class. |
packages[] |
string List of maven coordinates of jars to include on the driver and executor classpaths. |
repositories[] |
string List of additional remote repositories to search for the maven coordinates given with --packages. |
excludePackages[] |
string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts. |
PysparkJob
Field |
Description |
args[] |
string Optional arguments to pass to the driver. |
jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
fileUris[] |
string URIs of resource files to be copied to the working directory of Data Proc drivers |
archiveUris[] |
string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties |
string Property names and values, used to configure Data Proc and PySpark. |
mainPythonFileUri |
string URI of the file with the driver code. Must be a .py file. |
pythonFileUris[] |
string URIs of Python files to pass to the PySpark framework. |
packages[] |
string List of maven coordinates of jars to include on the driver and executor classpaths. |
repositories[] |
string List of additional remote repositories to search for the maven coordinates given with --packages. |
excludePackages[] |
string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts. |
HiveJob
Field |
Description |
properties |
string Property names and values, used to configure Data Proc and Hive. |
continueOnFailure |
bool Flag indicating whether a job should continue to run if a query fails. |
scriptVariables |
string Query variables and their values. |
jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Hive driver and each task. |
queryFileUri |
string URI of the script with all the necessary Hive queries. Includes only one of the fields |
queryList |
List of Hive queries to be used in the job. Includes only one of the fields |
QueryList
Field |
Description |
queries[] |
string List of Hive queries. |
ApplicationInfo
Field |
Description |
id |
string ID of YARN application |
applicationAttempts[] |
YARN application attempts |
ApplicationAttempt
Field |
Description |
id |
string ID of YARN application attempt |
amContainerId |
string ID of YARN Application Master container |