Data Proc API, REST: Job.create
Creates a job for a cluster.
HTTP request
POST https://dataproc.api.cloud.yandex.net/dataproc/v1/clusters/{clusterId}/jobs
Path parameters
Parameter | Description |
---|---|
clusterId | Required. ID of the cluster to create a job for. The maximum string length in characters is 50. |
Body parameters
{
"name": "string",
// includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`
"mapreduceJob": {
"args": [
"string"
],
"jarFileUris": [
"string"
],
"fileUris": [
"string"
],
"archiveUris": [
"string"
],
"properties": "object",
// `mapreduceJob` includes only one of the fields `mainJarFileUri`, `mainClass`
"mainJarFileUri": "string",
"mainClass": "string",
// end of the list of possible fields`mapreduceJob`
},
"sparkJob": {
"args": [
"string"
],
"jarFileUris": [
"string"
],
"fileUris": [
"string"
],
"archiveUris": [
"string"
],
"properties": "object",
"mainJarFileUri": "string",
"mainClass": "string",
"packages": [
"string"
],
"repositories": [
"string"
],
"excludePackages": [
"string"
]
},
"pysparkJob": {
"args": [
"string"
],
"jarFileUris": [
"string"
],
"fileUris": [
"string"
],
"archiveUris": [
"string"
],
"properties": "object",
"mainPythonFileUri": "string",
"pythonFileUris": [
"string"
],
"packages": [
"string"
],
"repositories": [
"string"
],
"excludePackages": [
"string"
]
},
"hiveJob": {
"properties": "object",
"continueOnFailure": true,
"scriptVariables": "object",
"jarFileUris": [
"string"
],
// `hiveJob` includes only one of the fields `queryFileUri`, `queryList`
"queryFileUri": "string",
"queryList": {
"queries": [
"string"
]
},
// end of the list of possible fields`hiveJob`
},
// end of the list of possible fields
}
Field | Description |
---|---|
name | string Name of the job. Value must match the regular expression |
mapreduceJob | object Specification for a MapReduce job. includes only one of the fields mapreduceJob , sparkJob , pysparkJob , hiveJob |
mapreduceJob. args[] |
string Optional arguments to pass to the driver. |
mapreduceJob. jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
mapreduceJob. fileUris[] |
string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks. |
mapreduceJob. archiveUris[] |
string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
mapreduceJob. properties |
object Property names and values, used to configure Data Proc and MapReduce. |
mapreduceJob. mainJarFileUri |
string mapreduceJob includes only one of the fields mainJarFileUri , mainClass HCFS URI of the .jar file containing the driver class. |
mapreduceJob. mainClass |
string mapreduceJob includes only one of the fields mainJarFileUri , mainClass The name of the driver class. |
sparkJob | object Specification for a Spark job. includes only one of the fields mapreduceJob , sparkJob , pysparkJob , hiveJob |
sparkJob. args[] |
string Optional arguments to pass to the driver. |
sparkJob. jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
sparkJob. fileUris[] |
string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks. |
sparkJob. archiveUris[] |
string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
sparkJob. properties |
object Property names and values, used to configure Data Proc and Spark. |
sparkJob. mainJarFileUri |
string The HCFS URI of the JAR file containing the |
sparkJob. mainClass |
string The name of the driver class. |
sparkJob. packages[] |
string List of maven coordinates of jars to include on the driver and executor classpaths. |
sparkJob. repositories[] |
string List of additional remote repositories to search for the maven coordinates given with --packages. |
sparkJob. excludePackages[] |
string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts. |
pysparkJob | object Specification for a PySpark job. includes only one of the fields mapreduceJob , sparkJob , pysparkJob , hiveJob |
pysparkJob. args[] |
string Optional arguments to pass to the driver. |
pysparkJob. jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
pysparkJob. fileUris[] |
string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks. |
pysparkJob. archiveUris[] |
string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
pysparkJob. properties |
object Property names and values, used to configure Data Proc and PySpark. |
pysparkJob. mainPythonFileUri |
string URI of the file with the driver code. Must be a .py file. |
pysparkJob. pythonFileUris[] |
string URIs of Python files to pass to the PySpark framework. |
pysparkJob. packages[] |
string List of maven coordinates of jars to include on the driver and executor classpaths. |
pysparkJob. repositories[] |
string List of additional remote repositories to search for the maven coordinates given with --packages. |
pysparkJob. excludePackages[] |
string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts. |
hiveJob | object Specification for a Hive job. includes only one of the fields mapreduceJob , sparkJob , pysparkJob , hiveJob |
hiveJob. properties |
object Property names and values, used to configure Data Proc and Hive. |
hiveJob. continueOnFailure |
boolean (boolean) Flag indicating whether a job should continue to run if a query fails. |
hiveJob. scriptVariables |
object Query variables and their values. |
hiveJob. jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Hive driver and each task. |
hiveJob. queryFileUri |
string hiveJob includes only one of the fields queryFileUri , queryList URI of the script with all the necessary Hive queries. |
hiveJob. queryList |
object List of Hive queries to be used in the job. hiveJob includes only one of the fields queryFileUri , queryList |
hiveJob. queryList. queries[] |
string List of Hive queries. |
Response
HTTP Code: 200 - OK
{
"id": "string",
"description": "string",
"createdAt": "string",
"createdBy": "string",
"modifiedAt": "string",
"done": true,
"metadata": "object",
// includes only one of the fields `error`, `response`
"error": {
"code": "integer",
"message": "string",
"details": [
"object"
]
},
"response": "object",
// end of the list of possible fields
}
An Operation resource. For more information, see Operation.
Field | Description |
---|---|
id | string ID of the operation. |
description | string Description of the operation. 0-256 characters long. |
createdAt | string (date-time) Creation timestamp. String in RFC3339 text format. The range of possible values is from To work with values in this field, use the APIs described in the Protocol Buffers reference. In some languages, built-in datetime utilities do not support nanosecond precision (9 digits). |
createdBy | string ID of the user or service account who initiated the operation. |
modifiedAt | string (date-time) The time when the Operation resource was last modified. String in RFC3339 text format. The range of possible values is from To work with values in this field, use the APIs described in the Protocol Buffers reference. In some languages, built-in datetime utilities do not support nanosecond precision (9 digits). |
done | boolean (boolean) If the value is |
metadata | object Service-specific metadata associated with the operation. It typically contains the ID of the target resource that the operation is performed on. Any method that returns a long-running operation should document the metadata type, if any. |
error | object The error result of the operation in case of failure or cancellation. includes only one of the fields error , response |
error. code |
integer (int32) Error code. An enum value of google.rpc.Code. |
error. message |
string An error message. |
error. details[] |
object A list of messages that carry the error details. |
response | object includes only one of the fields error , response The normal response of the operation in case of success. If the original method returns no data on success, such as Delete, the response is google.protobuf.Empty. If the original method is the standard Create/Update, the response should be the target resource of the operation. Any method that returns a long-running operation should document the response type, if any. |