Data Proc API, REST: Job methods
A set of methods for managing Data Proc jobs.
JSON Representation
{
"id": "string",
"clusterId": "string",
"createdAt": "string",
"startedAt": "string",
"finishedAt": "string",
"name": "string",
"createdBy": "string",
"status": "string",
"applicationInfo": {
"id": "string",
"applicationAttempts": [
{
"id": "string",
"amContainerId": "string"
}
]
},
// includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`
"mapreduceJob": {
"args": [
"string"
],
"jarFileUris": [
"string"
],
"fileUris": [
"string"
],
"archiveUris": [
"string"
],
"properties": "object",
// `mapreduceJob` includes only one of the fields `mainJarFileUri`, `mainClass`
"mainJarFileUri": "string",
"mainClass": "string",
// end of the list of possible fields`mapreduceJob`
},
"sparkJob": {
"args": [
"string"
],
"jarFileUris": [
"string"
],
"fileUris": [
"string"
],
"archiveUris": [
"string"
],
"properties": "object",
"mainJarFileUri": "string",
"mainClass": "string",
"packages": [
"string"
],
"repositories": [
"string"
],
"excludePackages": [
"string"
]
},
"pysparkJob": {
"args": [
"string"
],
"jarFileUris": [
"string"
],
"fileUris": [
"string"
],
"archiveUris": [
"string"
],
"properties": "object",
"mainPythonFileUri": "string",
"pythonFileUris": [
"string"
],
"packages": [
"string"
],
"repositories": [
"string"
],
"excludePackages": [
"string"
]
},
"hiveJob": {
"properties": "object",
"continueOnFailure": true,
"scriptVariables": "object",
"jarFileUris": [
"string"
],
// `hiveJob` includes only one of the fields `queryFileUri`, `queryList`
"queryFileUri": "string",
"queryList": {
"queries": [
"string"
]
},
// end of the list of possible fields`hiveJob`
},
// end of the list of possible fields
}
Field | Description |
---|---|
id | string ID of the job. Generated at creation time. |
clusterId | string ID of the Data Proc cluster that the job belongs to. |
createdAt | string (date-time) Creation timestamp. String in RFC3339 text format. The range of possible values is from To work with values in this field, use the APIs described in the Protocol Buffers reference. In some languages, built-in datetime utilities do not support nanosecond precision (9 digits). |
startedAt | string (date-time) The time when the job was started. String in RFC3339 text format. The range of possible values is from To work with values in this field, use the APIs described in the Protocol Buffers reference. In some languages, built-in datetime utilities do not support nanosecond precision (9 digits). |
finishedAt | string (date-time) The time when the job was finished. String in RFC3339 text format. The range of possible values is from To work with values in this field, use the APIs described in the Protocol Buffers reference. In some languages, built-in datetime utilities do not support nanosecond precision (9 digits). |
name | string Name of the job, specified in the create request. |
createdBy | string The id of the user who created the job |
status | string Job status.
|
applicationInfo | object Attributes of YARN application. |
applicationInfo. id |
string ID of YARN application |
applicationInfo. applicationAttempts[] |
object YARN application attempts |
applicationInfo. applicationAttempts[]. id |
string ID of YARN application attempt |
applicationInfo. applicationAttempts[]. amContainerId |
string ID of YARN Application Master container |
mapreduceJob | object Specification for a MapReduce job. includes only one of the fields mapreduceJob , sparkJob , pysparkJob , hiveJob |
mapreduceJob. args[] |
string Optional arguments to pass to the driver. |
mapreduceJob. jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
mapreduceJob. fileUris[] |
string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks. |
mapreduceJob. archiveUris[] |
string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
mapreduceJob. properties |
object Property names and values, used to configure Data Proc and MapReduce. |
mapreduceJob. mainJarFileUri |
string mapreduceJob includes only one of the fields mainJarFileUri , mainClass HCFS URI of the .jar file containing the driver class. |
mapreduceJob. mainClass |
string mapreduceJob includes only one of the fields mainJarFileUri , mainClass The name of the driver class. |
sparkJob | object Specification for a Spark job. includes only one of the fields mapreduceJob , sparkJob , pysparkJob , hiveJob |
sparkJob. args[] |
string Optional arguments to pass to the driver. |
sparkJob. jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
sparkJob. fileUris[] |
string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks. |
sparkJob. archiveUris[] |
string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
sparkJob. properties |
object Property names and values, used to configure Data Proc and Spark. |
sparkJob. mainJarFileUri |
string The HCFS URI of the JAR file containing the |
sparkJob. mainClass |
string The name of the driver class. |
sparkJob. packages[] |
string List of maven coordinates of jars to include on the driver and executor classpaths. |
sparkJob. repositories[] |
string List of additional remote repositories to search for the maven coordinates given with --packages. |
sparkJob. excludePackages[] |
string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts. |
pysparkJob | object Specification for a PySpark job. includes only one of the fields mapreduceJob , sparkJob , pysparkJob , hiveJob |
pysparkJob. args[] |
string Optional arguments to pass to the driver. |
pysparkJob. jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
pysparkJob. fileUris[] |
string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks. |
pysparkJob. archiveUris[] |
string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
pysparkJob. properties |
object Property names and values, used to configure Data Proc and PySpark. |
pysparkJob. mainPythonFileUri |
string URI of the file with the driver code. Must be a .py file. |
pysparkJob. pythonFileUris[] |
string URIs of Python files to pass to the PySpark framework. |
pysparkJob. packages[] |
string List of maven coordinates of jars to include on the driver and executor classpaths. |
pysparkJob. repositories[] |
string List of additional remote repositories to search for the maven coordinates given with --packages. |
pysparkJob. excludePackages[] |
string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts. |
hiveJob | object Specification for a Hive job. includes only one of the fields mapreduceJob , sparkJob , pysparkJob , hiveJob |
hiveJob. properties |
object Property names and values, used to configure Data Proc and Hive. |
hiveJob. continueOnFailure |
boolean (boolean) Flag indicating whether a job should continue to run if a query fails. |
hiveJob. scriptVariables |
object Query variables and their values. |
hiveJob. jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Hive driver and each task. |
hiveJob. queryFileUri |
string hiveJob includes only one of the fields queryFileUri , queryList URI of the script with all the necessary Hive queries. |
hiveJob. queryList |
object List of Hive queries to be used in the job. hiveJob includes only one of the fields queryFileUri , queryList |
hiveJob. queryList. queries[] |
string List of Hive queries. |
Methods
Method | Description |
---|---|
cancel | Cancels the specified Dataproc job. |
create | Creates a job for a cluster. |
get | Returns the specified job. |
list | Retrieves a list of jobs for a cluster. |
listLog | Returns a log for specified job. |