Data Proc API, REST: Cluster.Create
Creates a cluster in the specified folder.
HTTP request
POST https://dataproc.api.cloud.yandex.net/dataproc/v1/clusters
Body parameters
{
"folderId": "string",
"name": "string",
"description": "string",
"labels": "object",
"configSpec": {
"versionId": "string",
"hadoop": {
"services": [
"string"
],
"properties": "object",
"sshPublicKeys": [
"string"
],
"initializationActions": [
{
"uri": "string",
"args": [
"string"
],
"timeout": "string"
}
]
},
"subclustersSpec": [
{
"name": "string",
"role": "string",
"resources": {
"resourcePresetId": "string",
"diskTypeId": "string",
"diskSize": "string"
},
"subnetId": "string",
"hostsCount": "string",
"assignPublicIp": "boolean",
"autoscalingConfig": {
"maxHostsCount": "string",
"preemptible": "boolean",
"measurementDuration": "string",
"warmupDuration": "string",
"stabilizationDuration": "string",
"cpuUtilizationTarget": "string",
"decommissionTimeout": "string"
}
}
]
},
"zoneId": "string",
"serviceAccountId": "string",
"bucket": "string",
"uiProxy": "boolean",
"securityGroupIds": [
"string"
],
"hostGroupIds": [
"string"
],
"deletionProtection": "boolean",
"logGroupId": "string",
"environment": "string"
}
Field |
Description |
folderId |
string Required field. ID of the folder to create a cluster in. To get a folder ID make a yandex.cloud.resourcemanager.v1.FolderService.List request. |
name |
string Name of the cluster. The name must be unique within the folder. |
description |
string Description of the cluster. |
labels |
object (map<string, string>) Cluster labels as |
configSpec |
Required field. Configuration and resources for hosts that should be created with the cluster. |
zoneId |
string Required field. ID of the availability zone where the cluster should be placed. To get the list of available zones make a yandex.cloud.compute.v1.ZoneService.List request. |
serviceAccountId |
string Required field. ID of the service account to be used by the Data Proc manager agent. |
bucket |
string Name of the Object Storage bucket to use for Data Proc jobs. |
uiProxy |
boolean Enable UI Proxy feature. |
securityGroupIds[] |
string User security groups. |
hostGroupIds[] |
string Host groups to place VMs of cluster on. |
deletionProtection |
boolean Deletion Protection inhibits deletion of the cluster |
logGroupId |
string ID of the cloud logging log group to write logs. If not set, logs will not be sent to logging service |
environment |
enum (Environment) Environment of the cluster
|
CreateClusterConfigSpec
Field |
Description |
versionId |
string Version of the image for cluster provisioning. All available versions are listed in the documentation. |
hadoop |
Data Proc specific options. |
subclustersSpec[] |
Specification for creating subclusters. |
HadoopConfig
Hadoop configuration that describes services installed in a cluster,
their properties and settings.
Field |
Description |
services[] |
enum (Service) Set of services used in the cluster (if empty, the default set is used).
|
properties |
object (map<string, string>) Properties set for all hosts in For example, use the key 'hdfs:dfs.replication' to set the |
sshPublicKeys[] |
string List of public SSH keys to access to cluster hosts. |
initializationActions[] |
Set of init-actions |
InitializationAction
Field |
Description |
uri |
string URI of the executable file |
args[] |
string Arguments to the initialization action |
timeout |
string (int64) Execution timeout |
CreateSubclusterConfigSpec
Field |
Description |
name |
string Name of the subcluster. |
role |
enum (Role) Required field. Role of the subcluster in the Data Proc cluster.
|
resources |
Required field. Resource configuration for hosts in the subcluster. |
subnetId |
string Required field. ID of the VPC subnet used for hosts in the subcluster. |
hostsCount |
string (int64) Number of hosts in the subcluster. |
assignPublicIp |
boolean Assign public ip addresses for all hosts in subcluter. |
autoscalingConfig |
Configuration for instance group based subclusters |
Resources
Field |
Description |
resourcePresetId |
string ID of the resource preset for computational resources available to a host (CPU, memory etc.). |
diskTypeId |
string Type of the storage environment for the host.
|
diskSize |
string (int64) Volume of the storage available to a host, in bytes. |
AutoscalingConfig
Field |
Description |
maxHostsCount |
string (int64) Upper limit for total instance subcluster count. |
preemptible |
boolean Preemptible instances are stopped at least once every 24 hours, and can be stopped at any time |
measurementDuration |
string (duration) Required field. Time in seconds allotted for averaging metrics. |
warmupDuration |
string (duration) The warmup time of the instance in seconds. During this time, |
stabilizationDuration |
string (duration) Minimum amount of time in seconds allotted for monitoring before |
cpuUtilizationTarget |
string Defines an autoscaling rule based on the average CPU utilization of the instance group. |
decommissionTimeout |
string (int64) Timeout to gracefully decommission nodes during downscaling. In seconds. Default value: 120 |
Response
HTTP Code: 200 - OK
{
"id": "string",
"description": "string",
"createdAt": "string",
"createdBy": "string",
"modifiedAt": "string",
"done": "boolean",
"metadata": {
"clusterId": "string"
},
// Includes only one of the fields `error`, `response`
"error": {
"code": "integer",
"message": "string",
"details": [
"object"
]
},
"response": {
"id": "string",
"folderId": "string",
"createdAt": "string",
"name": "string",
"description": "string",
"labels": "object",
"monitoring": [
{
"name": "string",
"description": "string",
"link": "string"
}
],
"config": {
"versionId": "string",
"hadoop": {
"services": [
"string"
],
"properties": "object",
"sshPublicKeys": [
"string"
],
"initializationActions": [
{
"uri": "string",
"args": [
"string"
],
"timeout": "string"
}
]
}
},
"health": "string",
"status": "string",
"zoneId": "string",
"serviceAccountId": "string",
"bucket": "string",
"uiProxy": "boolean",
"securityGroupIds": [
"string"
],
"hostGroupIds": [
"string"
],
"deletionProtection": "boolean",
"logGroupId": "string",
"environment": "string"
}
// end of the list of possible fields
}
An Operation resource. For more information, see Operation.
Field |
Description |
id |
string ID of the operation. |
description |
string Description of the operation. 0-256 characters long. |
createdAt |
string (date-time) Creation timestamp. String in RFC3339 To work with values in this field, use the APIs described in the |
createdBy |
string ID of the user or service account who initiated the operation. |
modifiedAt |
string (date-time) The time when the Operation resource was last modified. String in RFC3339 To work with values in this field, use the APIs described in the |
done |
boolean If the value is |
metadata |
Service-specific metadata associated with the operation. |
error |
The error result of the operation in case of failure or cancellation. Includes only one of the fields The operation result. |
response |
The normal response of the operation in case of success. Includes only one of the fields The operation result. |
CreateClusterMetadata
Field |
Description |
clusterId |
string ID of the cluster that is being created. |
Status
The error result of the operation in case of failure or cancellation.
Field |
Description |
code |
integer (int32) Error code. An enum value of google.rpc.Code |
message |
string An error message. |
details[] |
object A list of messages that carry the error details. |
Cluster
A Data Proc cluster. For details about the concept, see documentation.
Field |
Description |
id |
string ID of the cluster. Generated at creation time. |
folderId |
string ID of the folder that the cluster belongs to. |
createdAt |
string (date-time) Creation timestamp. String in RFC3339 To work with values in this field, use the APIs described in the |
name |
string Name of the cluster. The name is unique within the folder. |
description |
string Description of the cluster. |
labels |
object (map<string, string>) Cluster labels as |
monitoring[] |
Monitoring systems relevant to the cluster. |
config |
Configuration of the cluster. |
health |
enum (Health) Aggregated cluster health.
|
status |
enum (Status) Cluster status.
|
zoneId |
string ID of the availability zone where the cluster resides. |
serviceAccountId |
string ID of service account for the Data Proc manager agent. |
bucket |
string Object Storage bucket to be used for Data Proc jobs that are run in the cluster. |
uiProxy |
boolean Whether UI Proxy feature is enabled. |
securityGroupIds[] |
string User security groups. |
hostGroupIds[] |
string Host groups hosting VMs of the cluster. |
deletionProtection |
boolean Deletion Protection inhibits deletion of the cluster |
logGroupId |
string ID of the cloud logging log group to write logs. If not set, default log group for the folder will be used. |
environment |
enum (Environment) Environment of the cluster
|
Monitoring
Metadata of a monitoring system for a Data Proc cluster.
Field |
Description |
name |
string Name of the monitoring system. |
description |
string Description of the monitoring system. |
link |
string Link to the monitoring system. |
ClusterConfig
Field |
Description |
versionId |
string Image version for cluster provisioning. |
hadoop |
Data Proc specific configuration options. |
HadoopConfig
Hadoop configuration that describes services installed in a cluster,
their properties and settings.
Field |
Description |
services[] |
enum (Service) Set of services used in the cluster (if empty, the default set is used).
|
properties |
object (map<string, string>) Properties set for all hosts in For example, use the key 'hdfs:dfs.replication' to set the |
sshPublicKeys[] |
string List of public SSH keys to access to cluster hosts. |
initializationActions[] |
Set of init-actions |
InitializationAction
Field |
Description |
uri |
string URI of the executable file |
args[] |
string Arguments to the initialization action |
timeout |
string (int64) Execution timeout |