At the beginning of the project, we couldn’t accurately estimate the total data volume, which is why we decided to launch a pilot in the cloud.
Time to market as the reason for Leroy Merlin East to start using Yandex Cloud platform

About the company
Leroy Merlin
Company goal
A major goal for Leroy Merlin is to build a platform to manage data for today’s decentralized data flows. This includes over 100 different databases, including web analytics, product data, and consumer shopping cart data.
Two main requirements for the platform:
- Scalability at all levels.
- Capacity to evolve into a hybrid solution.
As soon as we build the platform, we’ll launch predictive analytics to bring together data from completely different sources, both internal and external.
Solution
In this article, we’ll talk about
For the project, we used Yandex Compute to create a fleet of virtual machines and Yandex Object Storage to provide scalable storage.
To implement the project, we had to integrate multiple data sources, bring the data together in a scalable database, and start running analytics. The implemented solution is as follows:
- NiFi — Greenplum — Kafka.
- Write-ahead logging in Kafka.
- Data flow to Source 1.
To deploy a massively parallel database, we went with Greenplum, an open-source MPP DBMS. For data transport, we chose Kafka and NiFi. Our choice was motivated by the fact that before the start of the project, our contractors tested Yandex Cloud and proved we could build a cluster that meets our requirements without substantial performance degradation.

In the beginning of 2019, the system’s core was a Greenplum cluster of seven nodes: 2 hosts with 12 vCPU, 72 GB RAM, and 5 hosts with 32 vCPU, 256 GB RAM, and 5 TB SSD.
Results
The main result of the first stage was the deployed cluster that could accept 12.5 TB of uncompressed data and the launch of the Hadoop and S3 workbenches and Spark processing.
In practice this meant minimized time costs:
- Adding Greenplum nodes in a single click.
- Creating a Greenplum/Spark/Hadoop sandbox in 10 minutes.
In the near future, we plan to expand the data volumes to 70 TB.
During project implementation, we formulated the Indispensable rules for a business team in the digital age:
- Each business unit owns their data.
- The data owner is in charge of keeping the data available to the business in real time.
- The data owner is responsible for managing and describing data.
- Data operations cost money.
End user billing might be added at the first stage, but shouldn’t restrict user access to the digital platform. In this case, it’s important to understand the logic of cost estimations and in the future, subsequent usage cost allocation across business uses.

