Creating an external table from a Yandex Object Storage bucket table using a configuration file
To create an external table from a table in a Yandex Object Storage bucket, you need to provide a static access key for the service account in the query. You can do this using the S3 protocol
To create an external table using a configuration file:
If you no longer need the resources you created, delete them.
Getting started
Prepare the infrastructure:
-
Create a Managed Service for Greenplum® cluster with any suitable configuration.
-
In the cluster subnet, set up a NAT gateway and create a security group allowing all incoming and outgoing traffic from all addresses.
-
Create a virtual machine on Linux in the same cloud network as the Managed Service for Greenplum® cluster.
-
Create a Object Storage bucket with restricted access. Upload the
example.csv
file with a test table to it:10.2010
-
Create a static access key for the service account.
-
If you do not have Terraform yet, install it.
-
Get the authentication credentials. You can add them to environment variables or specify them later in the provider configuration file.
-
Configure and initialize a provider. There is no need to create a provider configuration file manually, you can download it
. -
Place the configuration file in a separate working directory and specify the parameter values. If you did not add the authentication credentials to environment variables, specify them in the configuration file.
-
In the working directory, create a file called
example.csv
with a test table:10.2010
-
Download the infrastructure configuration file
to the same working directory.This file describes:
- Network.
- Subnet
- Service account with the static access key.
- Managed Service for Greenplum® cluster.
- Bucket to place the
example.csv
file into. - Virtual machine with Ubuntu 20.04.
-
In the
greenplum-s3-vm.tf
file, specify:- Password for
user
to utilize for Managed Service for Greenplum® cluster access. - Virtual machine image ID.
- Username and path to the SSH key for virtual machine access.
- Folder ID for the same service account as specified in the provider settings.
- Bucket name that must be unique throughout Object Storage.
- Password for
-
In the terminal window, go to the directory containing the infrastructure plan.
-
To verify that the config files are correct, run the command below:
terraform validate
If there are any errors in the configuration files, Terraform will point them out.
-
Create the infrastructure required to follow the steps provided in this tutorial:
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
All the required resources will be created in the specified folder. You can check resource availability and their settings in the management console
. -
-
To retrieve the static key parameters, run the command below in the working directory:
terraform output -raw access_key > static-key.txt && \ echo \n >> static-key.txt && \ terraform output -raw secret_key >> static-key.txt
The command saves the static key ID and the static key you will need later to a file called
static-key.txt
. -
Go to the management console
and set up an NAT gateway for the subnet hosting your cluster.
Set up your web server with a configuration file
Set up a web server on your virtual machine and create an s3
configuration file on your web server:
-
Connect to the virtual machine over SSH.
-
Create a folder called
/opt/gp_http_server
and place yours3.config
file with the static key parameters and the other configuration settings for thes3
protocol in this folder:sudo mkdir /opt/gp_http_server
Contents of the
s3.config
file:[default] accessid = "<static_key_ID>" secret = "<static_key>" threadnum = 4 chunksize = 67108864 low_speed_limit = 10240 low_speed_time = 60 encryption = true version = 2 proxy = "" autocompress = true verifycert = true server_side_encryption = "" # gpcheckcloud config gpcheckcloud_newline = "\n"
-
In the
/etc/systemd/system/
folder, put a web server config file namedgp_s3_config_http.service
:[Unit] Description=HTTP server config to connect Greenplum with Object Storage After=network.target [Service] KillMode=process Restart=on-failure Type=simple User=root ExecStart=python3 -m http.server 8553 --directory /opt/gp_http_server [Install] WantedBy=multi-user.target
-
Start your web server:
sudo systemctl daemon-reload && \ sudo systemctl enable gp_s3_config_http && \ sudo systemctl start gp_s3_config_http
-
To check the state of the web server, run the command below:
sudo systemctl status gp_s3_config_http
Create an external table
-
Run a query to create an external table referencing the
example.csv
table in your bucket:CREATE READABLE EXTERNAL TABLE s3_ext_table(id int, year int) LOCATION('s3://storage.yandexcloud.net/<bucket_name>/example.csv config_server=http://<VM_private_IP>:8553/s3.config region=ru-central1-a') FORMAT 'csv';
A successful query will return the following response:
CREATE EXTERNAL TABLE
-
Query the external table for data:
SELECT * FROM s3_ext_table;
Expected result:
id | year ----+------ 10 | 2010 (1 row)
Delete the resources you created
Some resources are not free of charge. To avoid paying for them, delete the resources you no longer need:
- Delete the VM.
- If you reserved a public static IP address for the VM, delete it.
- Delete the bucket in Object Storage.
- Delete the Managed Service for Greenplum® cluster.
- Delete the service account.
- Delete the subnet.
- Delete the route table.
- Delete the NAT gateway.
- Delete the cloud network.
To delete the infrastructure created with Terraform:
-
In the terminal window, go to the directory containing the infrastructure plan.
-
Delete
greenplum-s3-vm.tf
. -
Run this command:
terraform validate
If there are any errors in the configuration files, Terraform will point them out.
-
Confirm updating the resources.
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
This will delete all the resources described in
greenplum-s3-vm.tf
. -
Greenplum® and Greenplum Database® are registered trademarks or trademarks of VMware, Inc. in the United States and/or other countries.