Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
  • Blog
  • Pricing
  • Documentation
Yandex project
© 2025 Yandex.Cloud LLC
Yandex Managed Service for Greenplum®
  • Getting started
    • Resource relationships
    • Host classes
    • Calculating the cluster configuration
    • Networking in Managed Service for Greenplum®
    • Quotas and limits
    • Backups
    • Resource groups
    • Sharding
    • Users and roles
    • User authentication
    • Command center
    • External tables
    • Managing connections
    • Expanding a cluster
    • Maintenance
    • Greenplum® settings
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials
  • Release notes
  1. Concepts
  2. Sharding

Sharding in Greenplum®

Written by
Yandex Cloud
Updated at February 18, 2025

Sharding is a horizontal Greenplum® cluster scaling strategy where parts of each database table are placed on different segment hosts. Every write or read request in Greenplum® utilizes all cluster segments.

Distribution keyDistribution key

To optimize JOIN operations on large tables, you can specify a distribution key explicitly. In this case, when joining tables by the fields specified in the key, a join operation will be performed locally at the segment level, and the query processing will be faster.

To create a table with a distribution key, provide one or more required fields in the DISTRIBUTED BY clause:

CREATE TABLE tableName
(
    column1 type1,
    column2 type2,
    ...
    columnN typeN
) DISTRIBUTED BY (column1);

If you choose a key incorrectly, most of the data might be stored in a single segment. This will degrade the cluster performance or shut down the segment if its host runs out of storage space. This is why you should not select the following as your distribution key:

  • Date and time fields.
  • Fields that may contain a large number of identical values.
  • Fields with a large number of NULL values.

Note

If you do not specify a distribution key when creating a table, data will be distributed across host segments either by the primary key (if specified) or by the first table field.

Greenplum® and Greenplum Database® are registered trademarks or trademarks of VMware, Inc. in the United States and/or other countries.

Was the article helpful?

Previous
Resource groups
Next
Users and roles
Yandex project
© 2025 Yandex.Cloud LLC