Data Proc API, gRPC: ClusterService.Create
Creates a cluster in the specified folder.
gRPC request
rpc Create (CreateClusterRequest) returns (operation.Operation)
CreateClusterRequest
{
"folder_id": "string",
"name": "string",
"description": "string",
"labels": "map<string, string>",
"config_spec": {
"version_id": "string",
"hadoop": {
"services": [
"Service"
],
"properties": "map<string, string>",
"ssh_public_keys": [
"string"
],
"initialization_actions": [
{
"uri": "string",
"args": [
"string"
],
"timeout": "int64"
}
]
},
"subclusters_spec": [
{
"name": "string",
"role": "Role",
"resources": {
"resource_preset_id": "string",
"disk_type_id": "string",
"disk_size": "int64"
},
"subnet_id": "string",
"hosts_count": "int64",
"assign_public_ip": "bool",
"autoscaling_config": {
"max_hosts_count": "int64",
"preemptible": "bool",
"measurement_duration": "google.protobuf.Duration",
"warmup_duration": "google.protobuf.Duration",
"stabilization_duration": "google.protobuf.Duration",
"cpu_utilization_target": "double",
"decommission_timeout": "int64"
}
}
]
},
"zone_id": "string",
"service_account_id": "string",
"bucket": "string",
"ui_proxy": "bool",
"security_group_ids": [
"string"
],
"host_group_ids": [
"string"
],
"deletion_protection": "bool",
"log_group_id": "string",
"environment": "Environment"
}
Field |
Description |
folder_id |
string Required field. ID of the folder to create a cluster in. To get a folder ID make a yandex.cloud.resourcemanager.v1.FolderService.List request. |
name |
string Name of the cluster. The name must be unique within the folder. |
description |
string Description of the cluster. |
labels |
object (map<string, string>) Cluster labels as |
config_spec |
Required field. Configuration and resources for hosts that should be created with the cluster. |
zone_id |
string Required field. ID of the availability zone where the cluster should be placed. To get the list of available zones make a yandex.cloud.compute.v1.ZoneService.List request. |
service_account_id |
string Required field. ID of the service account to be used by the Data Proc manager agent. |
bucket |
string Name of the Object Storage bucket to use for Data Proc jobs. |
ui_proxy |
bool Enable UI Proxy feature. |
security_group_ids[] |
string User security groups. |
host_group_ids[] |
string Host groups to place VMs of cluster on. |
deletion_protection |
bool Deletion Protection inhibits deletion of the cluster |
log_group_id |
string ID of the cloud logging log group to write logs. If not set, logs will not be sent to logging service |
environment |
enum Environment Environment of the cluster
|
CreateClusterConfigSpec
Field |
Description |
version_id |
string Version of the image for cluster provisioning. All available versions are listed in the documentation. |
hadoop |
Data Proc specific options. |
subclusters_spec[] |
Specification for creating subclusters. |
HadoopConfig
Hadoop configuration that describes services installed in a cluster,
their properties and settings.
Field |
Description |
services[] |
enum Service Set of services used in the cluster (if empty, the default set is used).
|
properties |
object (map<string, string>) Properties set for all hosts in For example, use the key 'hdfs:dfs.replication' to set the |
ssh_public_keys[] |
string List of public SSH keys to access to cluster hosts. |
initialization_actions[] |
Set of init-actions |
InitializationAction
Field |
Description |
uri |
string URI of the executable file |
args[] |
string Arguments to the initialization action |
timeout |
int64 Execution timeout |
CreateSubclusterConfigSpec
Field |
Description |
name |
string Name of the subcluster. |
role |
enum Role Required field. Role of the subcluster in the Data Proc cluster.
|
resources |
Required field. Resource configuration for hosts in the subcluster. |
subnet_id |
string Required field. ID of the VPC subnet used for hosts in the subcluster. |
hosts_count |
int64 Number of hosts in the subcluster. |
assign_public_ip |
bool Assign public ip addresses for all hosts in subcluter. |
autoscaling_config |
Configuration for instance group based subclusters |
Resources
Field |
Description |
resource_preset_id |
string ID of the resource preset for computational resources available to a host (CPU, memory etc.). |
disk_type_id |
string Type of the storage environment for the host.
|
disk_size |
int64 Volume of the storage available to a host, in bytes. |
AutoscalingConfig
Field |
Description |
max_hosts_count |
int64 Upper limit for total instance subcluster count. |
preemptible |
bool Preemptible instances are stopped at least once every 24 hours, and can be stopped at any time |
measurement_duration |
Required field. Time in seconds allotted for averaging metrics. |
warmup_duration |
The warmup time of the instance in seconds. During this time, |
stabilization_duration |
Minimum amount of time in seconds allotted for monitoring before |
cpu_utilization_target |
double Defines an autoscaling rule based on the average CPU utilization of the instance group. |
decommission_timeout |
int64 Timeout to gracefully decommission nodes during downscaling. In seconds. Default value: 120 |
operation.Operation
{
"id": "string",
"description": "string",
"created_at": "google.protobuf.Timestamp",
"created_by": "string",
"modified_at": "google.protobuf.Timestamp",
"done": "bool",
"metadata": {
"cluster_id": "string"
},
// Includes only one of the fields `error`, `response`
"error": "google.rpc.Status",
"response": {
"id": "string",
"folder_id": "string",
"created_at": "google.protobuf.Timestamp",
"name": "string",
"description": "string",
"labels": "map<string, string>",
"monitoring": [
{
"name": "string",
"description": "string",
"link": "string"
}
],
"config": {
"version_id": "string",
"hadoop": {
"services": [
"Service"
],
"properties": "map<string, string>",
"ssh_public_keys": [
"string"
],
"initialization_actions": [
{
"uri": "string",
"args": [
"string"
],
"timeout": "int64"
}
]
}
},
"health": "Health",
"status": "Status",
"zone_id": "string",
"service_account_id": "string",
"bucket": "string",
"ui_proxy": "bool",
"security_group_ids": [
"string"
],
"host_group_ids": [
"string"
],
"deletion_protection": "bool",
"log_group_id": "string",
"environment": "Environment"
}
// end of the list of possible fields
}
An Operation resource. For more information, see Operation.
Field |
Description |
id |
string ID of the operation. |
description |
string Description of the operation. 0-256 characters long. |
created_at |
Creation timestamp. |
created_by |
string ID of the user or service account who initiated the operation. |
modified_at |
The time when the Operation resource was last modified. |
done |
bool If the value is |
metadata |
Service-specific metadata associated with the operation. |
error |
The error result of the operation in case of failure or cancellation. Includes only one of the fields The operation result. |
response |
The normal response of the operation in case of success. Includes only one of the fields The operation result. |
CreateClusterMetadata
Field |
Description |
cluster_id |
string ID of the cluster that is being created. |
Cluster
A Data Proc cluster. For details about the concept, see documentation.
Field |
Description |
id |
string ID of the cluster. Generated at creation time. |
folder_id |
string ID of the folder that the cluster belongs to. |
created_at |
Creation timestamp. |
name |
string Name of the cluster. The name is unique within the folder. |
description |
string Description of the cluster. |
labels |
object (map<string, string>) Cluster labels as |
monitoring[] |
Monitoring systems relevant to the cluster. |
config |
Configuration of the cluster. |
health |
enum Health Aggregated cluster health.
|
status |
enum Status Cluster status.
|
zone_id |
string ID of the availability zone where the cluster resides. |
service_account_id |
string ID of service account for the Data Proc manager agent. |
bucket |
string Object Storage bucket to be used for Data Proc jobs that are run in the cluster. |
ui_proxy |
bool Whether UI Proxy feature is enabled. |
security_group_ids[] |
string User security groups. |
host_group_ids[] |
string Host groups hosting VMs of the cluster. |
deletion_protection |
bool Deletion Protection inhibits deletion of the cluster |
log_group_id |
string ID of the cloud logging log group to write logs. If not set, default log group for the folder will be used. |
environment |
enum Environment Environment of the cluster
|
Monitoring
Metadata of a monitoring system for a Data Proc cluster.
Field |
Description |
name |
string Name of the monitoring system. |
description |
string Description of the monitoring system. |
link |
string Link to the monitoring system. |
ClusterConfig
Field |
Description |
version_id |
string Image version for cluster provisioning. |
hadoop |
Data Proc specific configuration options. |
HadoopConfig
Hadoop configuration that describes services installed in a cluster,
their properties and settings.
Field |
Description |
services[] |
enum Service Set of services used in the cluster (if empty, the default set is used).
|
properties |
object (map<string, string>) Properties set for all hosts in For example, use the key 'hdfs:dfs.replication' to set the |
ssh_public_keys[] |
string List of public SSH keys to access to cluster hosts. |
initialization_actions[] |
Set of init-actions |
InitializationAction
Field |
Description |
uri |
string URI of the executable file |
args[] |
string Arguments to the initialization action |
timeout |
int64 Execution timeout |