Yandex Data Processing API, REST: Job.Create

Written by

Updated at October 30, 2025

HTTP request
Path parameters
Body parameters
MapreduceJob
SparkJob
PysparkJob
HiveJob
QueryList
Response
CreateJobMetadata
Status
Job
MapreduceJob
SparkJob
PysparkJob
HiveJob
QueryList
ApplicationInfo
ApplicationAttempt

Creates a job for a cluster.

HTTP request

POST https://dataproc.api.cloud.yandex.net/dataproc/v1/clusters/{clusterId}/jobs

Path parameters

Field

Description

clusterId

string

Required field. ID of the cluster to create a job for.

Body parameters

{
  "name": "string",
  // Includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`
  "mapreduceJob": {
    "args": [
      "string"
    ],
    "jarFileUris": [
      "string"
    ],
    "fileUris": [
      "string"
    ],
    "archiveUris": [
      "string"
    ],
    "properties": "object",
    // Includes only one of the fields `mainJarFileUri`, `mainClass`
    "mainJarFileUri": "string",
    "mainClass": "string"
    // end of the list of possible fields
  },
  "sparkJob": {
    "args": [
      "string"
    ],
    "jarFileUris": [
      "string"
    ],
    "fileUris": [
      "string"
    ],
    "archiveUris": [
      "string"
    ],
    "properties": "object",
    "mainJarFileUri": "string",
    "mainClass": "string",
    "packages": [
      "string"
    ],
    "repositories": [
      "string"
    ],
    "excludePackages": [
      "string"
    ]
  },
  "pysparkJob": {
    "args": [
      "string"
    ],
    "jarFileUris": [
      "string"
    ],
    "fileUris": [
      "string"
    ],
    "archiveUris": [
      "string"
    ],
    "properties": "object",
    "mainPythonFileUri": "string",
    "pythonFileUris": [
      "string"
    ],
    "packages": [
      "string"
    ],
    "repositories": [
      "string"
    ],
    "excludePackages": [
      "string"
    ]
  },
  "hiveJob": {
    "properties": "object",
    "continueOnFailure": "boolean",
    "scriptVariables": "object",
    "jarFileUris": [
      "string"
    ],
    // Includes only one of the fields `queryFileUri`, `queryList`
    "queryFileUri": "string",
    "queryList": {
      "queries": [
        "string"
      ]
    }
    // end of the list of possible fields
  }
  // end of the list of possible fields
}

Field	Description
name	string Name of the job.
mapreduceJob	MapreduceJob Specification for a MapReduce job. Includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`. Specification for the job.
sparkJob	SparkJob Specification for a Spark job. Includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`. Specification for the job.
pysparkJob	PysparkJob Specification for a PySpark job. Includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`. Specification for the job.
hiveJob	HiveJob Specification for a Hive job. Includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`. Specification for the job.

MapreduceJob

Field	Description
args[]	string Optional arguments to pass to the driver.
jarFileUris[]	string JAR file URIs to add to CLASSPATH of the Yandex Data Processing driver and each task.
fileUris[]	string URIs of resource files to be copied to the working directory of Yandex Data Processing drivers and distributed Hadoop tasks.
archiveUris[]	string URIs of archives to be extracted to the working directory of Yandex Data Processing drivers and tasks.
properties	object (map<string, string>) Property names and values, used to configure Yandex Data Processing and MapReduce.
mainJarFileUri	string HCFS URI of the .jar file containing the driver class. Includes only one of the fields `mainJarFileUri`, `mainClass`.
mainClass	string The name of the driver class. Includes only one of the fields `mainJarFileUri`, `mainClass`.

SparkJob

Field	Description
args[]	string Optional arguments to pass to the driver.
jarFileUris[]	string JAR file URIs to add to CLASSPATH of the Yandex Data Processing driver and each task.
fileUris[]	string URIs of resource files to be copied to the working directory of Yandex Data Processing drivers and distributed Hadoop tasks.
archiveUris[]	string URIs of archives to be extracted to the working directory of Yandex Data Processing drivers and tasks.
properties	object (map<string, string>) Property names and values, used to configure Yandex Data Processing and Spark.
mainJarFileUri	string The HCFS URI of the JAR file containing the `main` class for the job.
mainClass	string The name of the driver class.
packages[]	string List of maven coordinates of jars to include on the driver and executor classpaths.
repositories[]	string List of additional remote repositories to search for the maven coordinates given with --packages.
excludePackages[]	string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts.

PysparkJob

Field	Description
args[]	string Optional arguments to pass to the driver.
jarFileUris[]	string JAR file URIs to add to CLASSPATH of the Yandex Data Processing driver and each task.
fileUris[]	string URIs of resource files to be copied to the working directory of Yandex Data Processing drivers and distributed Hadoop tasks.
archiveUris[]	string URIs of archives to be extracted to the working directory of Yandex Data Processing drivers and tasks.
properties	object (map<string, string>) Property names and values, used to configure Yandex Data Processing and PySpark.
mainPythonFileUri	string URI of the file with the driver code. Must be a .py file.
pythonFileUris[]	string URIs of Python files to pass to the PySpark framework.
packages[]	string List of maven coordinates of jars to include on the driver and executor classpaths.
repositories[]	string List of additional remote repositories to search for the maven coordinates given with --packages.
excludePackages[]	string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts.

HiveJob

Field	Description
properties	object (map<string, string>) Property names and values, used to configure Yandex Data Processing and Hive.
continueOnFailure	boolean Flag indicating whether a job should continue to run if a query fails.
scriptVariables	object (map<string, string>) Query variables and their values.
jarFileUris[]	string JAR file URIs to add to CLASSPATH of the Hive driver and each task.
queryFileUri	string URI of the script with all the necessary Hive queries. Includes only one of the fields `queryFileUri`, `queryList`.
queryList	QueryList List of Hive queries to be used in the job. Includes only one of the fields `queryFileUri`, `queryList`.

QueryList

Field

Description

queries[]

string

List of Hive queries.

Response

HTTP Code: 200 - OK

{
  "id": "string",
  "description": "string",
  "createdAt": "string",
  "createdBy": "string",
  "modifiedAt": "string",
  "done": "boolean",
  "metadata": {
    "clusterId": "string",
    "jobId": "string"
  },
  // Includes only one of the fields `error`, `response`
  "error": {
    "code": "integer",
    "message": "string",
    "details": [
      "object"
    ]
  },
  "response": {
    "id": "string",
    "clusterId": "string",
    "createdAt": "string",
    "startedAt": "string",
    "finishedAt": "string",
    "name": "string",
    "createdBy": "string",
    "status": "string",
    // Includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`
    "mapreduceJob": {
      "args": [
        "string"
      ],
      "jarFileUris": [
        "string"
      ],
      "fileUris": [
        "string"
      ],
      "archiveUris": [
        "string"
      ],
      "properties": "object",
      // Includes only one of the fields `mainJarFileUri`, `mainClass`
      "mainJarFileUri": "string",
      "mainClass": "string"
      // end of the list of possible fields
    },
    "sparkJob": {
      "args": [
        "string"
      ],
      "jarFileUris": [
        "string"
      ],
      "fileUris": [
        "string"
      ],
      "archiveUris": [
        "string"
      ],
      "properties": "object",
      "mainJarFileUri": "string",
      "mainClass": "string",
      "packages": [
        "string"
      ],
      "repositories": [
        "string"
      ],
      "excludePackages": [
        "string"
      ]
    },
    "pysparkJob": {
      "args": [
        "string"
      ],
      "jarFileUris": [
        "string"
      ],
      "fileUris": [
        "string"
      ],
      "archiveUris": [
        "string"
      ],
      "properties": "object",
      "mainPythonFileUri": "string",
      "pythonFileUris": [
        "string"
      ],
      "packages": [
        "string"
      ],
      "repositories": [
        "string"
      ],
      "excludePackages": [
        "string"
      ]
    },
    "hiveJob": {
      "properties": "object",
      "continueOnFailure": "boolean",
      "scriptVariables": "object",
      "jarFileUris": [
        "string"
      ],
      // Includes only one of the fields `queryFileUri`, `queryList`
      "queryFileUri": "string",
      "queryList": {
        "queries": [
          "string"
        ]
      }
      // end of the list of possible fields
    },
    // end of the list of possible fields
    "applicationInfo": {
      "id": "string",
      "applicationAttempts": [
        {
          "id": "string",
          "amContainerId": "string"
        }
      ]
    }
  }
  // end of the list of possible fields
}

An Operation resource. For more information, see Operation.

Field	Description
id	string ID of the operation.
description	string Description of the operation. 0-256 characters long.
createdAt	string (date-time) Creation timestamp. String in RFC3339 text format. The range of possible values is from `0001-01-01T00:00:00Z` to `9999-12-31T23:59:59.999999999Z`, i.e. from 0 to 9 digits for fractions of a second. To work with values in this field, use the APIs described in the Protocol Buffers reference. In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).
createdBy	string ID of the user or service account who initiated the operation.
modifiedAt	string (date-time) The time when the Operation resource was last modified. String in RFC3339 text format. The range of possible values is from `0001-01-01T00:00:00Z` to `9999-12-31T23:59:59.999999999Z`, i.e. from 0 to 9 digits for fractions of a second. To work with values in this field, use the APIs described in the Protocol Buffers reference. In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).
done	boolean If the value is `false`, it means the operation is still in progress. If `true`, the operation is completed, and either `error` or `response` is available.
metadata	CreateJobMetadata Service-specific metadata associated with the operation. It typically contains the ID of the target resource that the operation is performed on. Any method that returns a long-running operation should document the metadata type, if any.
error	Status The error result of the operation in case of failure or cancellation. Includes only one of the fields `error`, `response`. The operation result. If `done == false` and there was no failure detected, neither `error` nor `response` is set. If `done == false` and there was a failure detected, `error` is set. If `done == true`, exactly one of `error` or `response` is set.
response	Job The normal response of the operation in case of success. If the original method returns no data on success, such as Delete, the response is google.protobuf.Empty. If the original method is the standard Create/Update, the response should be the target resource of the operation. Any method that returns a long-running operation should document the response type, if any. Includes only one of the fields `error`, `response`. The operation result. If `done == false` and there was no failure detected, neither `error` nor `response` is set. If `done == false` and there was a failure detected, `error` is set. If `done == true`, exactly one of `error` or `response` is set.

CreateJobMetadata

Field

Description

clusterId

string

Required field. ID of the cluster that the job is being created for.

jobId

string

ID of the job being created.

Status

The error result of the operation in case of failure or cancellation.

Field	Description
code	integer (int32) Error code. An enum value of google.rpc.Code.
message	string An error message.
details[]	object A list of messages that carry the error details.

Job

A Yandex Data Processing job. For details about the concept, see documentation.

Field	Description
id	string ID of the job. Generated at creation time.
clusterId	string ID of the Yandex Data Processing cluster that the job belongs to.
createdAt	string (date-time) Creation timestamp. String in RFC3339 text format. The range of possible values is from `0001-01-01T00:00:00Z` to `9999-12-31T23:59:59.999999999Z`, i.e. from 0 to 9 digits for fractions of a second. To work with values in this field, use the APIs described in the Protocol Buffers reference. In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).
startedAt	string (date-time) The time when the job was started. String in RFC3339 text format. The range of possible values is from `0001-01-01T00:00:00Z` to `9999-12-31T23:59:59.999999999Z`, i.e. from 0 to 9 digits for fractions of a second. To work with values in this field, use the APIs described in the Protocol Buffers reference. In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).
finishedAt	string (date-time) The time when the job was finished. String in RFC3339 text format. The range of possible values is from `0001-01-01T00:00:00Z` to `9999-12-31T23:59:59.999999999Z`, i.e. from 0 to 9 digits for fractions of a second. To work with values in this field, use the APIs described in the Protocol Buffers reference. In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).
name	string Name of the job, specified in the JobService.Create request.
createdBy	string The id of the user who created the job
status	enum (Status) Job status. `STATUS_UNSPECIFIED` `PROVISIONING`: Job is logged in the database and is waiting for the agent to run it. `PENDING`: Job is acquired by the agent and is in the queue for execution. `RUNNING`: Job is being run in the cluster. `ERROR`: Job failed to finish the run properly. `DONE`: Job is finished. `CANCELLED`: Job is cancelled. `CANCELLING`: Job is waiting for cancellation.
mapreduceJob	MapreduceJob Specification for a MapReduce job. Includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`. Specification for the job.
sparkJob	SparkJob Specification for a Spark job. Includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`. Specification for the job.
pysparkJob	PysparkJob Specification for a PySpark job. Includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`. Specification for the job.
hiveJob	HiveJob Specification for a Hive job. Includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`. Specification for the job.
applicationInfo	ApplicationInfo Attributes of YARN application.

MapreduceJob

Field	Description
args[]	string Optional arguments to pass to the driver.
jarFileUris[]	string JAR file URIs to add to CLASSPATH of the Yandex Data Processing driver and each task.
fileUris[]	string URIs of resource files to be copied to the working directory of Yandex Data Processing drivers and distributed Hadoop tasks.
archiveUris[]	string URIs of archives to be extracted to the working directory of Yandex Data Processing drivers and tasks.
properties	object (map<string, string>) Property names and values, used to configure Yandex Data Processing and MapReduce.
mainJarFileUri	string HCFS URI of the .jar file containing the driver class. Includes only one of the fields `mainJarFileUri`, `mainClass`.
mainClass	string The name of the driver class. Includes only one of the fields `mainJarFileUri`, `mainClass`.

SparkJob

Field	Description
args[]	string Optional arguments to pass to the driver.
jarFileUris[]	string JAR file URIs to add to CLASSPATH of the Yandex Data Processing driver and each task.
fileUris[]	string URIs of resource files to be copied to the working directory of Yandex Data Processing drivers and distributed Hadoop tasks.
archiveUris[]	string URIs of archives to be extracted to the working directory of Yandex Data Processing drivers and tasks.
properties	object (map<string, string>) Property names and values, used to configure Yandex Data Processing and Spark.
mainJarFileUri	string The HCFS URI of the JAR file containing the `main` class for the job.
mainClass	string The name of the driver class.
packages[]	string List of maven coordinates of jars to include on the driver and executor classpaths.
repositories[]	string List of additional remote repositories to search for the maven coordinates given with --packages.
excludePackages[]	string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts.

PysparkJob

Field	Description
args[]	string Optional arguments to pass to the driver.
jarFileUris[]	string JAR file URIs to add to CLASSPATH of the Yandex Data Processing driver and each task.
fileUris[]	string URIs of resource files to be copied to the working directory of Yandex Data Processing drivers and distributed Hadoop tasks.
archiveUris[]	string URIs of archives to be extracted to the working directory of Yandex Data Processing drivers and tasks.
properties	object (map<string, string>) Property names and values, used to configure Yandex Data Processing and PySpark.
mainPythonFileUri	string URI of the file with the driver code. Must be a .py file.
pythonFileUris[]	string URIs of Python files to pass to the PySpark framework.
packages[]	string List of maven coordinates of jars to include on the driver and executor classpaths.
repositories[]	string List of additional remote repositories to search for the maven coordinates given with --packages.
excludePackages[]	string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts.

HiveJob

Field	Description
properties	object (map<string, string>) Property names and values, used to configure Yandex Data Processing and Hive.
continueOnFailure	boolean Flag indicating whether a job should continue to run if a query fails.
scriptVariables	object (map<string, string>) Query variables and their values.
jarFileUris[]	string JAR file URIs to add to CLASSPATH of the Hive driver and each task.
queryFileUri	string URI of the script with all the necessary Hive queries. Includes only one of the fields `queryFileUri`, `queryList`.
queryList	QueryList List of Hive queries to be used in the job. Includes only one of the fields `queryFileUri`, `queryList`.

QueryList

Field

Description

queries[]

string

List of Hive queries.

ApplicationInfo

Field

Description

string

ID of YARN application

applicationAttempts[]

ApplicationAttempt

YARN application attempts

ApplicationAttempt

Field

Description

string

ID of YARN application attempt

amContainerId

string

ID of YARN Application Master container

Yandex Data Processing API, REST: Job.Create

HTTP requestHTTP request

Path parametersPath parameters

Body parametersBody parameters

MapreduceJobMapreduceJob

SparkJobSparkJob

PysparkJobPysparkJob

HiveJobHiveJob

QueryListQueryList

ResponseResponse

CreateJobMetadataCreateJobMetadata

StatusStatus

JobJob

MapreduceJobMapreduceJob

SparkJobSparkJob

PysparkJobPysparkJob

HiveJobHiveJob

QueryListQueryList

ApplicationInfoApplicationInfo

ApplicationAttemptApplicationAttempt

Was the article helpful?

HTTP request

Path parameters

Body parameters

MapreduceJob

SparkJob

PysparkJob

HiveJob

QueryList

Response

CreateJobMetadata

Status

Job

MapreduceJob

SparkJob

PysparkJob

HiveJob

QueryList

ApplicationInfo

ApplicationAttempt