Analyzing Object Storage logs in DataLens
For the Yandex Object Storage bucket you can enable action logging. The logs store information about operations with a bucket and the objects in it. Analysis of bucket logs can be useful, for example, if you want to understand what caused a sharp increase in load or get the overall picture of traffic distribution.
You can create visualizations for your analysis using Yandex DataLens. You must transfer previously saved logs to the ClickHouse® database, which will be used as a source for DataLens.
To analyze the logs and present the results in interactive charts:
- Prepare your cloud.
- Create a bucket for storing logs.
- Enable log export.
- Prepare the data source.
- Create a connection in DataLens.
- Create a dataset in DataLens.
- Create charts in DataLens.
- Create a dashboard in DataLens.
If you no longer need the resources you created, delete them.
Prepare your cloud
Sign up for Yandex Cloud and create a billing account:
- Go to the management console
and log in to Yandex Cloud or create an account if you do not have one yet. - On the Yandex Cloud Billing
page, make sure you have a billing account linked and it has theACTIVE
orTRIAL_ACTIVE
status. If you do not have a billing account, create one.
If you have an active billing account, you can go to the cloud page
Learn more about clouds and folders.
Required paid resources
The cost includes:
- Fee for data storage in Object Storage, operations with data, and outgoing traffic (see Object Storage pricing).
- Fee for a continuously running Managed Service for ClickHouse® cluster (see Managed Service for ClickHouse® pricing).
Create a bucket for storing logs
- In the management console
, select the folder you want to create a bucket in. - In the list of services, select Object Storage.
- Click Create bucket.
- In the ** Name** field, enter a name for the bucket.
- In the Object read access and Object listing access fields, select Restricted.
- Click Create bucket.
-
If you do not have the AWS CLI yet, install and configure it.
-
Create a bucket:
aws --endpoint-url https://storage.yandexcloud.net \ s3 mb s3://<bucket_name>
Result:
make_bucket: <bucket_name>
Note
Terraform uses a service account to interact with Object Storage. Assign to the service account the required role, e.g., storage.admin
, for the folder where you are going to create resources.
With Terraform
Terraform is distributed under the Business Source License
For more information about the provider resources, see the documentation on the Terraform
If you don't have Terraform, install it and configure the Yandex Cloud provider.
-
Describe the parameters for creating a service account and access key in the configuration file:
... // Creating a service account resource "yandex_iam_service_account" "sa" { name = "<service_account_name>" } // Assigning a role to a service account resource "yandex_resourcemanager_folder_iam_member" "sa-admin" { folder_id = "<folder_ID>" role = "storage.admin" member = "serviceAccount:${yandex_iam_service_account.sa.id}" } // Creating a static access key resource "yandex_iam_service_account_static_access_key" "sa-static-key" { service_account_id = yandex_iam_service_account.sa.id description = "static access key for object storage" }
-
Add bucket parameters to the configuration file:
resource "yandex_storage_bucket" "bucket-logs" { access_key = yandex_iam_service_account_static_access_key.sa-static-key.access_key secret_key = yandex_iam_service_account_static_access_key.sa-static-key.secret_key bucket = "<bucket_name>" }
For more information about the
yandex_storage_bucket
resource, see the Terraform provider documentation . -
Make sure the settings are correct.
-
Using the command line, navigate to the folder that contains the up-to-date Terraform configuration files with an infrastructure plan.
-
Run the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
-
Create a bucket.
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
-
Use the create REST API method.
Enable log export
- In the management console
, select the bucket you want to enable logging for. - In the left-hand panel, select Settings.
- Open the Logging tab.
- Enable Write logs.
- Select Bucket for log storage.
- In the Prefix field, specify the
s3-logs/
prefix. - Click Save.
-
Create a file named
log-config.json
with the following contents:{ "LoggingEnabled": { "TargetBucket": "<bucket_name>", "TargetPrefix": "s3-logs/" } }
-
Run this command:
aws s3api put-bucket-logging \ --endpoint-url https://storage.yandexcloud.net \ --bucket <bucket_name> \ --bucket-logging-status file://log-config.json
Where
--bucket
is the name of the bucket to enable action logging for.
To enable logging for a bucket you want to track:
-
Open the Terraform configuration file and add the
logging
section to the bucket description fragment.resource "yandex_storage_bucket" "bucket-logs" { access_key = "<static_key_ID>" secret_key = "<secret_key>" bucket = "<name_of_bucket_to_store_logs>" } resource "yandex_storage_bucket" "bucket" { access_key = "<static_key_ID>" secret_key = "<secret_key>" bucket = "<source_bucket_name>" acl = "private" logging { target_bucket = yandex_storage_bucket.bucket-logs.id target_prefix = "s3-logs/" } }
Where:
access_key
: Static access key ID.secret_key
: Secret access key value.target_bucket
: Reference to the log storage bucket.target_prefix
: Key prefix for objects with logs.
For more information about the
yandex_storage_bucket
resource parameters in Terraform, see the provider documentation .-
In the terminal, change to the folder where you edited the configuration file.
-
Make sure the configuration file is correct using the command:
terraform validate
If the configuration is correct, the following message is returned:
Success! The configuration is valid.
-
Run the command:
terraform plan
The terminal will display a list of resources with parameters. No changes are made at this step. If the configuration contains errors, Terraform will point them out.
-
Apply the configuration changes:
terraform apply
-
Confirm the changes: type
yes
in the terminal and press Enter.
All the resources you need will then be created in the specified folder. You can check the new resources and their configuration using the management console
.
Use the REST API putBucketLogging method.
Prepare the data source
Create a ClickHouse® cluster
To create a Managed Service for ClickHouse® cluster, you need the vpc.user role and the managed-clickhouse.editor role or higher. For more information on assigning roles, see the Identity and Access Management documentation.
-
In the management console
, select the folder you want to create a cluster in. -
In the list of services, select Managed Service for ClickHouse.
-
In the window that opens, click Create cluster.
-
Specify the settings for a ClickHouse® cluster:
-
Under Basic parameters, specify
s3-logs
in the Cluster name field. -
Under Resources, select
burstable
in the Type field. -
Under Hosts, click
and enable the Public access option. Click Save. -
Under DBMS settings:
- In the User management via SQL field, select
Disabled
. - In the Username field, specify
user
. - In the Password field, set a password.
- In the DB name field, specify
s3_data
.
Remember the database name.
- In the User management via SQL field, select
-
Under Service settings, enable the following options:
- DataLens access.
- Access from the management console.
-
-
Click Create cluster.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To create a cluster:
-
Check whether the folder has any subnets for the cluster hosts:
yc vpc subnet list
If there are no subnets in the folder, create the required subnets in VPC.
-
Specify the cluster parameters in the create command:
yc managed-clickhouse cluster create \ --name s3-logs \ --environment production \ --network-name <network_name> \ --host type=clickhouse,zone-id=<availability_zone>,subnet-id=<subnet_ID> \ --clickhouse-resource-preset b2.medium \ --clickhouse-disk-type network-ssd \ --clickhouse-disk-size 10 \ --user name=user,password=<user_password> \ --database name=s3_data \ --datalens-access=true \ --websql-access=true
-
Add a description of the cluster and cluster hosts to the configuration file:
resource "yandex_mdb_clickhouse_cluster" "s3-logs" { name = "s3-logs" environment = "PRODUCTION" network_id = yandex_vpc_network.<network_name_in_Terraform>.id clickhouse { resources { resource_preset_id = "b2.medium" disk_type_id = "network-ssd" disk_size = 10 } } database { name = "s3_data" } user { name = "user" password = "<password>" permission { database_name = "s3_data" } } host { type = "CLICKHOUSE" zone = "<availability_zone>" subnet_id = yandex_vpc_subnet.<subnet_name_in_Terraform>.id } access { datalens = true web_sql = true } }
For more information about the resources you can create with Terraform, see the provider documentation
. -
Make sure the settings are correct.
-
Using the command line, navigate to the folder that contains the up-to-date Terraform configuration files with an infrastructure plan.
-
Run the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
-
Create a cluster:
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
-
Use the create REST API method.
After creating the cluster, you will be automatically redirected to the Clusters page.
Wait for the cluster status to change to Alive
.
Change user settings
- Select the
s3-logs
cluster. - Go to the Users tab.
- Click
and select Configure. - Click Additional settings → Settings.
- In the Date time input format field, select
best_effort
. - Click Save.
Create a static key
To create a table with access to Object Storage, you need a static key. Create it and save the ID and secret part of the key.
Create a table in the database
-
Select the
s3-logs
cluster. -
Go to the SQL tab.
-
In the Password field, enter the password.
-
Click Connect.
-
In the window on the right, write an SQL query:
CREATE TABLE s3_data.s3logs ( bucket String, -- Bucket name. bytes_received Int64, -- Size of the request in bytes. bytes_send Int64, -- Response size in bytes. handler String, -- Request method in this format: REST.<HTTP method>.<subject>. http_referer String, -- URL of request source. ip String, -- User IP address. method String, -- HTTP request method. object_key String, -- Object key in URL-encoded format. protocol String, -- Data transfer protocol version. range String, -- HTTP header that defines range of bytes to load from object. requester String, -- User ID. request_args String, -- Arguments of the URL request. request_id String, -- Query ID. request_path String, -- Full path of the request. request_time Int64, -- Request processing time in milliseconds. scheme String, -- Data transfer protocol type. -- The possible values are: -- * http: Application layer protocol. -- * https: Application layer protocol with encryption support. ssl_protocol String, -- Security protocol. status Int64, -- HTTP response code. storage_class String, -- Storage class of the object. timestamp DateTime, -- Date and time of the operation with the bucket in the YYYY-MM-DDTHH:MM:MMZ format. user_agent String, -- Client application (user agent) that executed the request. version_id String, -- Object version. vhost String -- Virtual host of request. -- The possible values are: -- * storage.yandexcloud.net. -- * <bucket_name>.storage.yandexcloud.net. -- * website.yandexcloud.net. -- * <bucket_name>.website.yandexcloud.net. ) ENGINE = S3( 'https://storage.yandexcloud.net/<bucket_name>/s3-logs/*', '<key_ID>', '<secret_key>', 'JSONEachRow' ) SETTINGS date_time_input_format='best_effort';
-
Click Execute.
Create a connection in DataLens
-
Select the
s3-logs
cluster. -
Go to the DataLens tab.
-
In the window that opens, click Create connection.
-
Fill in the connection settings:
- Add a connection name:
s3-logs-con
. - In the Cluster field, select
s3-logs
. - In the Host name field, select the ClickHouse® host from the drop-down list.
- Enter the DB user name and password.
- Add a connection name:
-
Click Confirm connection.
-
After checking the connection, click Create connection.
-
In the window that opens, enter a name for the connection and click Create.
Create a dataset in DataLens
-
Click Create dataset.
-
In the new dataset, move the
s3_data.s3logs
table to the workspace. -
Go to the Fields tab.
-
Click
Add field. -
Create a calculated field with the file type:
- Field name:
object_type
- Formula:
SPLIT([object_key], '.', -1)
- Field name:
-
Click Create.
-
In the top-right corner, click Save.
-
Enter the
s3-dataset
name for the dataset and click Create. -
When the dataset is saved, click Create chart in the top-right corner.
Create charts in DataLens
Create the first chart
To visualize the number of requests to a bucket using different methods, create a pie chart:
- For the visualization type, select
Pie chart
. - Drag the
method
field from the Dimensions section to the Colors section. - Drag the
request_id
field from the Dimensions section to the Measures section. - In the top-right corner, click Save.
- In the window that opens, enter the
S3 - Method pie
name for the chart and click Save.
Create the second chart
To visualize the number of requests ratio by object type, create a bar chart:
-
Copy the chart from the previous step:
- In the top-right corner, click the down arrow next to the Save button.
- Click Save as.
- In the window that opens, enter the
S3 - Object type bars
name for the new chart and click Save.
-
Select the Bar chart visualization type. The
method
andrequest_id
fields will automatically appear in the X and Y sections, respectively. -
Delete the
method
field from the X section and drag theobject_type
field there. -
In the top-right corner, click Save.
Create the third chart
To visualize the distribution of outgoing traffic by day, create a bar chart:
-
Copy the chart from the previous step:
- In the top-right corner, click the down arrow next to the Save button.
- Click Save as.
- In the window that opens, enter the
S3 - Traffic generated by days
name for the new chart and click Save.
-
Drag the
object_type
field from the X section to the Filters section. -
In the window that opens, select the types of objects to display in the chart and click Apply filters.
-
Drag the
timestamp
field from the Dimensions section to the X section. -
Delete the
request_id
field from the Y section and drag thebytes_send
field there. -
In the top-right corner, click Save.
Create a dashboard in DataLens and add charts there
- Go to the DataLens home page
. - Click Create dashboard.
- Enter a name for the
S3 Logs Analysis
dashboard and click Create. - In the top-right corner, click Add and select
Chart
. - In the Chart field, click Select and choose the
S3 - Method pie
chart from the list. - Click Add. The chart will be displayed on the dashboard.
- Repeat the previous steps for the
S3 - Object type bars
andS3 - Traffic generated by days
charts.
How to delete the resources you created
Delete the resources you no longer need to avoid paying for them:
ClickHouse® is a registered trademark of ClickHouse, Inc