Analyzing Object Storage logs in DataLens
For a Yandex Object Storage bucket, you can enable action logging. The logs store info on operations involving a bucket and the objects in it. For example, analyzing bucket logs can help understand what caused a steep load increase or get the overall picture of traffic distribution.
You can create visualizations for your analysis using Yandex DataLens. You must transfer previously saved logs to the ClickHouse® database, which will be used as a source for DataLens.
To analyze logs and present the results in interactive charts:
- Get your cloud ready.
- Create a bucket for storing logs.
- Enable log export.
- Get the data source ready.
- Create a connection in DataLens.
- Create a dataset in DataLens.
- Create charts in DataLens.
- Create a dashboard in DataLens.
If you no longer need the resources you created, delete them.
Get your cloud ready
Sign up for Yandex Cloud and create a billing account:
- Go to the management console
and log in to Yandex Cloud or create an account if you do not have one yet. - On the Yandex Cloud Billing
page, make sure you have a billing account linked and it has theACTIVE
orTRIAL_ACTIVE
status. If you do not have a billing account, create one.
If you have an active billing account, you can go to the cloud page
Learn more about clouds and folders.
Required paid resources
The cost includes:
- Fee for data storage in Object Storage, data operations, and outbound traffic (see Object Storage pricing).
- Fee for a continuously running Managed Service for ClickHouse® cluster (see Managed Service for ClickHouse® pricing).
Create a bucket for storing logs
- In the management console
, select the folder where you want to create a bucket. - From the list of services, select Object Storage.
- Click Create bucket.
- In the ** Name** field, enter a name for the bucket.
- In the Object read access and Object listing access fields, select Restricted.
- Click Create bucket.
-
If you do not have the AWS CLI yet, install and configure it.
-
Create a bucket:
aws --endpoint-url https://storage.yandexcloud.net \ s3 mb s3://<bucket_name>
Result:
make_bucket: <bucket_name>
Note
Terraform uses a service account to interact with Object Storage. Assign to the service account the required role, e.g., storage.admin
, for the folder where you are going to create resources.
With Terraform
Terraform is distributed under the Business Source License
For more information about the provider resources, see the documentation on the Terraform
If you don't have Terraform, install it and configure the Yandex Cloud provider.
-
Describe the properties for creating a service account and access key in the configuration file:
... // Creating a service account resource "yandex_iam_service_account" "sa" { name = "<service_account_name>" } // Assigning a role to a service account resource "yandex_resourcemanager_folder_iam_member" "sa-admin" { folder_id = "<folder_ID>" role = "storage.admin" member = "serviceAccount:${yandex_iam_service_account.sa.id}" } // Creating a static access key resource "yandex_iam_service_account_static_access_key" "sa-static-key" { service_account_id = yandex_iam_service_account.sa.id description = "static access key for object storage" }
-
Add the bucket properties to the configuration file:
resource "yandex_storage_bucket" "bucket-logs" { access_key = yandex_iam_service_account_static_access_key.sa-static-key.access_key secret_key = yandex_iam_service_account_static_access_key.sa-static-key.secret_key bucket = "<bucket_name>" }
For more information about the
yandex_storage_bucket
resource, see this Terraform overview article . -
Make sure the settings are correct.
-
Using the command line, navigate to the folder that contains the up-to-date Terraform configuration files with an infrastructure plan.
-
Run the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
-
Create a bucket.
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
-
Use the create REST API method.
Enable log export
- In the management console
, select the bucket you want to enable logging for. - In the left-hand panel, select Settings.
- Open the Logging tab.
- Enable Write logs.
- Select Bucket for log storage.
- In the Prefix field, specify the
s3-logs/
prefix. - Click Save.
-
Create a file named
log-config.json
with the following contents:{ "LoggingEnabled": { "TargetBucket": "<bucket_name>", "TargetPrefix": "s3-logs/" } }
-
Run this command:
aws s3api put-bucket-logging \ --endpoint-url https://storage.yandexcloud.net \ --bucket <bucket_name> \ --bucket-logging-status file://log-config.json
Where
--bucket
is the name of the bucket to enable action logging for.
To enable logging for a bucket you want to track:
-
Open the Terraform configuration file and add the
logging
section to the bucket description fragment.resource "yandex_storage_bucket" "bucket-logs" { access_key = "<static_key_ID>" secret_key = "<secret_key>" bucket = "<name_of_bucket_to_store_logs>" } resource "yandex_storage_bucket" "bucket" { access_key = "<static_key_ID>" secret_key = "<secret_key>" bucket = "<source_bucket_name>" acl = "private" logging { target_bucket = yandex_storage_bucket.bucket-logs.id target_prefix = "s3-logs/" } }
Where:
access_key
: Static access key ID.secret_key
: Secret access key value.target_bucket
: Reference to the log storage bucket.target_prefix
: Key prefix for objects with logs.
For more information about
yandex_storage_bucket
properties in Terraform, see this Terraform article .-
In the terminal, change to the folder where you edited the configuration file.
-
Make sure the configuration file is correct using the command:
terraform validate
If the configuration is correct, the following message is returned:
Success! The configuration is valid.
-
Run the command:
terraform plan
The terminal will display a list of resources with parameters. No changes are made at this step. If the configuration contains errors, Terraform will point them out.
-
Apply the configuration changes:
terraform apply
-
Confirm the changes: type
yes
in the terminal and press Enter.
This will create all the resources you need in the specified folder. You can check the new resources and their settings using the management console
.
Use the REST API putBucketLogging method.
Get the data source ready
Create a ClickHouse® cluster
To create a Managed Service for ClickHouse® cluster, you will need the vpc.user and managed-clickhouse.editor roles or higher. To learn more about assigning roles, see this Identity and Access Management article.
-
In the management console
, select the folder where you want to create a cluster. -
From the list of services, select Managed Service for ClickHouse.
-
In the window that opens, click Create cluster.
-
Specify the ClickHouse® cluster settings:
-
Under Basic parameters, specify
s3-logs
in the Cluster name field. -
Under Resources, select
burstable
in the Type field. -
Under Hosts, click
and enable the Public access option. Click Save. -
Under DBMS settings:
- In the User management via SQL field, select
Disabled
. - In the Username field, specify
user
. - In the Password field, set a password.
- In the DB name field, specify
s3_data
.
Memorize the database name.
- In the User management via SQL field, select
-
Under Service settings, enable these options:
- DataLens access.
- Access from the management console.
-
-
Click Create cluster.
If you do not have the Yandex Cloud CLI yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder through the --folder-name
or --folder-id
parameter.
To create a cluster:
-
Check whether the folder has any subnets for the cluster hosts:
yc vpc subnet list
If there are no subnets in the folder, create the required subnets in VPC.
-
Specify the cluster properties in the creation command:
yc managed-clickhouse cluster create \ --name s3-logs \ --environment production \ --network-name <network_name> \ --host type=clickhouse,zone-id=<availability_zone>,subnet-id=<subnet_ID> \ --clickhouse-resource-preset b2.medium \ --clickhouse-disk-type network-ssd \ --clickhouse-disk-size 10 \ --user name=user,password=<user_password> \ --database name=s3_data \ --datalens-access=true \ --websql-access=true
-
Add the cluster description and cluster hosts to the configuration file:
resource "yandex_mdb_clickhouse_cluster" "s3-logs" { name = "s3-logs" environment = "PRODUCTION" network_id = yandex_vpc_network.<network_name_in_Terraform>.id clickhouse { resources { resource_preset_id = "b2.medium" disk_type_id = "network-ssd" disk_size = 10 } } database { name = "s3_data" } user { name = "user" password = "<password>" permission { database_name = "s3_data" } } host { type = "CLICKHOUSE" zone = "<availability_zone>" subnet_id = yandex_vpc_subnet.<subnet_name_in_Terraform>.id } access { datalens = true web_sql = true } }
To learn more about the resources you can create with Terraform, see the Terraform documentation
. -
Make sure the settings are correct.
-
Using the command line, navigate to the folder that contains the up-to-date Terraform configuration files with an infrastructure plan.
-
Run the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
-
Create a cluster:
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
-
Use the create REST API method.
After creating the cluster, you will be automatically redirected to the Clusters page.
Wait until the cluster status switches to Alive
.
Change the user settings
- Select the
s3-logs
cluster. - Navigate to the Users tab.
- Click
and select Configure. - Click Additional settings → Settings.
- In the Date time input format field, select
best_effort
. - Click Save.
Create a static key
To create a table with access to Object Storage, you will need a static key. Create one and save its ID and secret part.
Create a table in the database
-
Select the
s3-logs
cluster. -
Navigate to the SQL tab.
-
In the Password field, enter the password.
-
Click Connect.
-
In the window on the right, write this SQL query:
CREATE TABLE s3_data.s3logs ( bucket String, -- Bucket name. bytes_received Int64, -- Request size in bytes. bytes_send Int64, -- Response size in bytes. handler String, -- Request method in this format: REST.<HTTP method>.<subject>. http_referer String, -- Request source URL. ip String, -- User IP address. method String, -- HTTP request method. object_key String, -- Object key in URL-encoded format. protocol String, -- Data transfer protocol version. range String, -- HTTP header defining the byte range to load from the object. requester String, -- User ID. request_args String, -- Arguments of the URL request. request_id String, -- Request ID. request_path String, -- Full request path. request_time Int64, -- Request processing time in milliseconds. scheme String, -- Data transfer protocol type. -- The possible values are as follows: -- * http: Application layer protocol. -- * https: Application layer protocol with encryption support. ssl_protocol String, -- Security protocol. status Int64, -- HTTP response code. storage_class String, -- Object storage class. timestamp DateTime, -- Date and time of the bucket operation in the YYYY-MM-DDTHH:MM:SSZ format. user_agent String, -- Client application (user agent) that executed the request. version_id String, -- Object version. vhost String -- Virtual host of the request. -- The possible values are as follows: -- * storage.yandexcloud.net. -- * <bucket_name>.storage.yandexcloud.net. -- * website.yandexcloud.net. -- * <bucket_name>.website.yandexcloud.net. ) ENGINE = S3( 'https://storage.yandexcloud.net/<bucket_name>/s3-logs/*', '<key_ID>', '<secret_key>', 'JSONEachRow' ) SETTINGS date_time_input_format='best_effort';
-
Click Execute.
Create a connection in DataLens
-
Select the
s3-logs
cluster. -
Navigate to the DataLens tab.
-
In the window that opens, click Create connection.
-
Fill in the connection settings:
- Add a connection name:
s3-logs-con
. - In the Cluster field, select
s3-logs
. - In the Host name field, select the ClickHouse® host from the drop-down list.
- Enter the DB username and password.
- Add a connection name:
-
Click Confirm connection.
-
After checking the connection, click Create connection.
-
In the window that opens, enter a name for the connection and click Create.
Create a dataset in DataLens
-
Click Create dataset.
-
In the new dataset, move the
s3_data.s3logs
table to the workspace. -
Navigate to the Fields tab.
-
Click
Add field. -
Create a calculated field with the file type:
- Field name:
object_type
- Formula:
SPLIT([object_key], '.', -1)
- Field name:
-
Click Create.
-
In the top-right corner, click Save.
-
Enter the
s3-dataset
name for the dataset and click Create. -
Once the dataset is saved, click Create chart in the top-right corner.
Create charts in DataLens
Create the first chart
To visualize the number of requests to a bucket via different methods, create a pie chart:
- Select
Pie chart
as the visualization type. - Drag the
method
field from the Dimensions section to the Colors section. - Drag the
request_id
field from the Dimensions section to the Measures section. - In the top-right corner, click Save.
- In the window that opens, enter the
S3 - Method pie
name for the new chart and click Save.
Create the second chart
To visualize the number of requests ratio by object type, create a bar chart:
-
Copy the chart from the previous step:
- In the top-right corner, click the check mark next to the Save button.
- Click Save as.
- In the window that opens, enter the
S3 - Object type bars
name for the new chart and click Save.
-
Select Bar chart as the visualization type. The
method
andrequest_id
fields will automatically appear in the X and Y sections, respectively. -
Delete the
method
field from the X section and drag theobject_type
field there. -
In the top-right corner, click Save.
Create the third chart
To visualize the distribution of outbound traffic by day, create a bar chart:
-
Copy the chart from the previous step:
- In the top-right corner, click the check mark next to the Save button.
- Click Save as.
- In the window that opens, enter the
S3 - Traffic generated by days
name for the new chart and click Save.
-
Drag the
object_type
field from the X section to the Filters section. -
In the window that opens, select the types of objects to display in the chart and click Apply filters.
-
Drag the
timestamp
field from the Dimensions section to the X section. -
Delete the
request_id
field from the Y section and drag thebytes_send
field there. -
In the top-right corner, click Save.
Create a dashboard in DataLens and add charts to it
- Go to the DataLens home page
. - Click Create dashboard.
- Enter the
S3 Logs Analysis
name for the dashboard and click Create. - In the top-right corner, click Add and select
Chart
. - In the Chart field, click Select and select the
S3 - Method pie
chart from the list. - Click Add. You will now see the chart on the dashboard.
- Repeat the previous steps for the
S3 - Object type bars
andS3 - Traffic generated by days
charts.
How to delete the resources you created
Delete the resources you no longer need to avoid paying for them:
ClickHouse® is a registered trademark of ClickHouse, Inc