Data Proc API, REST: Job.Create
Creates a job for a cluster.
HTTP request
POST https://dataproc.api.cloud.yandex.net/dataproc/v1/clusters/{clusterId}/jobs
Path parameters
Field |
Description |
clusterId |
string Required field. ID of the cluster to create a job for. |
Body parameters
{
"name": "string",
// Includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`
"mapreduceJob": {
"args": [
"string"
],
"jarFileUris": [
"string"
],
"fileUris": [
"string"
],
"archiveUris": [
"string"
],
"properties": "string",
// Includes only one of the fields `mainJarFileUri`, `mainClass`
"mainJarFileUri": "string",
"mainClass": "string"
// end of the list of possible fields
},
"sparkJob": {
"args": [
"string"
],
"jarFileUris": [
"string"
],
"fileUris": [
"string"
],
"archiveUris": [
"string"
],
"properties": "string",
"mainJarFileUri": "string",
"mainClass": "string",
"packages": [
"string"
],
"repositories": [
"string"
],
"excludePackages": [
"string"
]
},
"pysparkJob": {
"args": [
"string"
],
"jarFileUris": [
"string"
],
"fileUris": [
"string"
],
"archiveUris": [
"string"
],
"properties": "string",
"mainPythonFileUri": "string",
"pythonFileUris": [
"string"
],
"packages": [
"string"
],
"repositories": [
"string"
],
"excludePackages": [
"string"
]
},
"hiveJob": {
"properties": "string",
"continueOnFailure": "boolean",
"scriptVariables": "string",
"jarFileUris": [
"string"
],
// Includes only one of the fields `queryFileUri`, `queryList`
"queryFileUri": "string",
"queryList": {
"queries": [
"string"
]
}
// end of the list of possible fields
}
// end of the list of possible fields
}
Field |
Description |
name |
string Name of the job. |
mapreduceJob |
Specification for a MapReduce job. Includes only one of the fields Specification for the job. |
sparkJob |
Specification for a Spark job. Includes only one of the fields Specification for the job. |
pysparkJob |
Specification for a PySpark job. Includes only one of the fields Specification for the job. |
hiveJob |
Specification for a Hive job. Includes only one of the fields Specification for the job. |
MapreduceJob
Field |
Description |
args[] |
string Optional arguments to pass to the driver. |
jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
fileUris[] |
string URIs of resource files to be copied to the working directory of Data Proc drivers |
archiveUris[] |
string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties |
string Property names and values, used to configure Data Proc and MapReduce. |
mainJarFileUri |
string HCFS URI of the .jar file containing the driver class. Includes only one of the fields |
mainClass |
string The name of the driver class. Includes only one of the fields |
SparkJob
Field |
Description |
args[] |
string Optional arguments to pass to the driver. |
jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
fileUris[] |
string URIs of resource files to be copied to the working directory of Data Proc drivers |
archiveUris[] |
string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties |
string Property names and values, used to configure Data Proc and Spark. |
mainJarFileUri |
string The HCFS URI of the JAR file containing the |
mainClass |
string The name of the driver class. |
packages[] |
string List of maven coordinates of jars to include on the driver and executor classpaths. |
repositories[] |
string List of additional remote repositories to search for the maven coordinates given with --packages. |
excludePackages[] |
string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts. |
PysparkJob
Field |
Description |
args[] |
string Optional arguments to pass to the driver. |
jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
fileUris[] |
string URIs of resource files to be copied to the working directory of Data Proc drivers |
archiveUris[] |
string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties |
string Property names and values, used to configure Data Proc and PySpark. |
mainPythonFileUri |
string URI of the file with the driver code. Must be a .py file. |
pythonFileUris[] |
string URIs of Python files to pass to the PySpark framework. |
packages[] |
string List of maven coordinates of jars to include on the driver and executor classpaths. |
repositories[] |
string List of additional remote repositories to search for the maven coordinates given with --packages. |
excludePackages[] |
string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts. |
HiveJob
Field |
Description |
properties |
string Property names and values, used to configure Data Proc and Hive. |
continueOnFailure |
boolean Flag indicating whether a job should continue to run if a query fails. |
scriptVariables |
string Query variables and their values. |
jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Hive driver and each task. |
queryFileUri |
string URI of the script with all the necessary Hive queries. Includes only one of the fields |
queryList |
List of Hive queries to be used in the job. Includes only one of the fields |
QueryList
Field |
Description |
queries[] |
string List of Hive queries. |
Response
HTTP Code: 200 - OK
{
"id": "string",
"description": "string",
"createdAt": "string",
"createdBy": "string",
"modifiedAt": "string",
"done": "boolean",
"metadata": {
"clusterId": "string",
"jobId": "string"
},
// Includes only one of the fields `error`, `response`
"error": {
"code": "integer",
"message": "string",
"details": [
"object"
]
},
"response": {
"id": "string",
"clusterId": "string",
"createdAt": "string",
"startedAt": "string",
"finishedAt": "string",
"name": "string",
"createdBy": "string",
"status": "string",
// Includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`
"mapreduceJob": {
"args": [
"string"
],
"jarFileUris": [
"string"
],
"fileUris": [
"string"
],
"archiveUris": [
"string"
],
"properties": "string",
// Includes only one of the fields `mainJarFileUri`, `mainClass`
"mainJarFileUri": "string",
"mainClass": "string"
// end of the list of possible fields
},
"sparkJob": {
"args": [
"string"
],
"jarFileUris": [
"string"
],
"fileUris": [
"string"
],
"archiveUris": [
"string"
],
"properties": "string",
"mainJarFileUri": "string",
"mainClass": "string",
"packages": [
"string"
],
"repositories": [
"string"
],
"excludePackages": [
"string"
]
},
"pysparkJob": {
"args": [
"string"
],
"jarFileUris": [
"string"
],
"fileUris": [
"string"
],
"archiveUris": [
"string"
],
"properties": "string",
"mainPythonFileUri": "string",
"pythonFileUris": [
"string"
],
"packages": [
"string"
],
"repositories": [
"string"
],
"excludePackages": [
"string"
]
},
"hiveJob": {
"properties": "string",
"continueOnFailure": "boolean",
"scriptVariables": "string",
"jarFileUris": [
"string"
],
// Includes only one of the fields `queryFileUri`, `queryList`
"queryFileUri": "string",
"queryList": {
"queries": [
"string"
]
}
// end of the list of possible fields
},
// end of the list of possible fields
"applicationInfo": {
"id": "string",
"applicationAttempts": [
{
"id": "string",
"amContainerId": "string"
}
]
}
}
// end of the list of possible fields
}
An Operation resource. For more information, see Operation.
Field |
Description |
id |
string ID of the operation. |
description |
string Description of the operation. 0-256 characters long. |
createdAt |
string (date-time) Creation timestamp. String in RFC3339 To work with values in this field, use the APIs described in the |
createdBy |
string ID of the user or service account who initiated the operation. |
modifiedAt |
string (date-time) The time when the Operation resource was last modified. String in RFC3339 To work with values in this field, use the APIs described in the |
done |
boolean If the value is |
metadata |
Service-specific metadata associated with the operation. |
error |
The error result of the operation in case of failure or cancellation. Includes only one of the fields The operation result. |
response |
The normal response of the operation in case of success. Includes only one of the fields The operation result. |
CreateJobMetadata
Field |
Description |
clusterId |
string Required field. ID of the cluster that the job is being created for. |
jobId |
string ID of the job being created. |
Status
The error result of the operation in case of failure or cancellation.
Field |
Description |
code |
integer (int32) Error code. An enum value of google.rpc.Code |
message |
string An error message. |
details[] |
object A list of messages that carry the error details. |
Job
A Data Proc job. For details about the concept, see documentation.
Field |
Description |
id |
string ID of the job. Generated at creation time. |
clusterId |
string ID of the Data Proc cluster that the job belongs to. |
createdAt |
string (date-time) Creation timestamp. String in RFC3339 To work with values in this field, use the APIs described in the |
startedAt |
string (date-time) The time when the job was started. String in RFC3339 To work with values in this field, use the APIs described in the |
finishedAt |
string (date-time) The time when the job was finished. String in RFC3339 To work with values in this field, use the APIs described in the |
name |
string Name of the job, specified in the JobService.Create request. |
createdBy |
string The id of the user who created the job |
status |
enum (Status) Job status.
|
mapreduceJob |
Specification for a MapReduce job. Includes only one of the fields Specification for the job. |
sparkJob |
Specification for a Spark job. Includes only one of the fields Specification for the job. |
pysparkJob |
Specification for a PySpark job. Includes only one of the fields Specification for the job. |
hiveJob |
Specification for a Hive job. Includes only one of the fields Specification for the job. |
applicationInfo |
Attributes of YARN application. |
MapreduceJob
Field |
Description |
args[] |
string Optional arguments to pass to the driver. |
jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
fileUris[] |
string URIs of resource files to be copied to the working directory of Data Proc drivers |
archiveUris[] |
string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties |
string Property names and values, used to configure Data Proc and MapReduce. |
mainJarFileUri |
string HCFS URI of the .jar file containing the driver class. Includes only one of the fields |
mainClass |
string The name of the driver class. Includes only one of the fields |
SparkJob
Field |
Description |
args[] |
string Optional arguments to pass to the driver. |
jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
fileUris[] |
string URIs of resource files to be copied to the working directory of Data Proc drivers |
archiveUris[] |
string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties |
string Property names and values, used to configure Data Proc and Spark. |
mainJarFileUri |
string The HCFS URI of the JAR file containing the |
mainClass |
string The name of the driver class. |
packages[] |
string List of maven coordinates of jars to include on the driver and executor classpaths. |
repositories[] |
string List of additional remote repositories to search for the maven coordinates given with --packages. |
excludePackages[] |
string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts. |
PysparkJob
Field |
Description |
args[] |
string Optional arguments to pass to the driver. |
jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task. |
fileUris[] |
string URIs of resource files to be copied to the working directory of Data Proc drivers |
archiveUris[] |
string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks. |
properties |
string Property names and values, used to configure Data Proc and PySpark. |
mainPythonFileUri |
string URI of the file with the driver code. Must be a .py file. |
pythonFileUris[] |
string URIs of Python files to pass to the PySpark framework. |
packages[] |
string List of maven coordinates of jars to include on the driver and executor classpaths. |
repositories[] |
string List of additional remote repositories to search for the maven coordinates given with --packages. |
excludePackages[] |
string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts. |
HiveJob
Field |
Description |
properties |
string Property names and values, used to configure Data Proc and Hive. |
continueOnFailure |
boolean Flag indicating whether a job should continue to run if a query fails. |
scriptVariables |
string Query variables and their values. |
jarFileUris[] |
string JAR file URIs to add to CLASSPATH of the Hive driver and each task. |
queryFileUri |
string URI of the script with all the necessary Hive queries. Includes only one of the fields |
queryList |
List of Hive queries to be used in the job. Includes only one of the fields |
QueryList
Field |
Description |
queries[] |
string List of Hive queries. |
ApplicationInfo
Field |
Description |
id |
string ID of YARN application |
applicationAttempts[] |
YARN application attempts |
ApplicationAttempt
Field |
Description |
id |
string ID of YARN application attempt |
amContainerId |
string ID of YARN Application Master container |