Background
Celsus is a company specializing in the development of artificial intelligence systems in radiology. In 19 Russian regions, Celsus’ system has already been implemented as an AI system at the local level. The results of processed scans are available to all medical organizations.
With the help of the provider’s technical support team, the company migrated their platform to Yandex Cloud in just a few days. As a result, the development process has accelerated — deploying a system or internal service can now be done in one day.
The new platform provides Celsus with a full range of services and technologies that it needs, along with a satisfactory speed for processing scans.
In addition, it was possible to significantly improve service quality thanks to flexible resource management and fast scalability in response to changing loads. The new solution likewise complies fully with the strict IS and Russian legislative requirements, including those related to the secure storage of personal data.
Company goal
Celsus was founded in 2018 and specializes in the development of AI systems in radiology.
The Celsus solution is the first system in Russia registered by Roszdravnadzor as a third risk class medical device in 2020. The production is certified according to ISO 13485.
The company’s developments are also certified in the European Union (CE Mark), confirming compliance with European safety standards.
19 Russian regions work with Celsus. Pilot projects have also been launched in India, Pakistan, and Belarus.
The company has more than 4 million processed medical images in its arsenal.
The following AI-based systems for automatic analysis and interpretation of medical images are in commercial operation:
- Mammography — detection of malignant tumors and other significant changes
- Fluorography and chest X-ray — 13 signs of pathology
- CT chest scans — a comprehensive service covering 10 pathologies
- CT brain scans — detection of hemorrhages and ischemic strokes.
The company initially planned to deploy in the cloud. In the early stages of the platform’s operation, the company turned to various cloud providers, both Russian and foreign. At that time, the cloud was used to perform separate tasks: to create instances for training, test the first versions of models, and check the inference of a working neural network.
The use of foreign cloud services was abandoned due to changes in data storage requirements and licensing risks, as well as the inability to predict future changes. Therefore, it was necessary to switch to Russian cloud infrastructure. The more the company expands its operations, including in the Russian regions, the more it is necessary to ensure the rapid deployment of the solution for new customers. Operational scaling is also required: the number of products and the volume of research are growing. Moreover, it is necessary to automatically scale the infrastructure to handle loads during training and during periods of more active use of the service.
When choosing a cloud provider, the company was guided by the following criteria:
- Data storing and processing must be organized on Russian territory.
- The cloud platform should provide the full range of technologies and services necessary for Celsus’ work.
- The cloud platform should use its own services to avoid licensing risks.
- The cloud should provide the speed needed to process scans and the ability to scale during times of increased workloads.
- The platform must be certified in the field of information security — the resources must be certified in accordance with Russian Federal Law 152-FZ and the requirements of the FSTEC.
- The decision must comply with the rules of information processing in accordance with the international General Data Protection Regulation (GDPR).
After considering several competitive solutions, Celsus chose Yandex Cloud since it met all of the above requirements.
Solution
Celsus is a medical decision support system based on neural network (Deep Learning) technologies. It is designed to analyze digital medical images — radiography, computer tomography, magnetic resonance imaging, and others.
All services are based on the company’s own neural networks. At the same time, architectures often used for 2D and 3D detection and segmentation (Faster-RCNN, DETR, U-Net, Anisotropic Hybrid Networks, etc.) were used, but with significant modifications.
The solution’s ML stack includes the following technologies:
- PyTorch — an open-source Python machine learning framework based on Torch
- ClearML — a framework for tracking ML experiments
- Redis — a non-relational, resident DBMS
- Apache AirFlow — a system for creating, executing, monitoring, and orchestrating data flows
- Docker and Kubernetes — for creating and managing container clusters.
The backend was designed using .NET and deployed in containers running Kubernetes. The data is stored in the PostgreSQL database. Apache Kafka was chosen as the message broker. The system supports DICOM standard for processing, storing, transmitting, printing, and visualizing medical images.
The Celsus solution works as follows. The system’s backend integrates with a medical facility or an entire region using the DICOM protocol. When an image processing task is received, the scan data is uploaded to the backend, which are then checked for adequacy. After that, a request is made to the ML core via the API. The neural network processes the scan and returns a JSON-format response to the backend — the findings, their coordinates or masks, dimensions, the text of the conclusion, and so on. The system generates the final response in DICOM SC (image) and DICOM SR (text) formats and returns the response to the client using the DICOM protocol.
When migrating to Yandex Cloud, Celsus was deployed using data platform services, as well as Yandex Compute Cloud virtual machines.
The compute resources connected using Yandex Compute Cloud are used for inference and model training, and backend hosting. In addition, Yandex Compute Cloud provides the organization with its internal services: the Mattermost instant messager and data exchange application, the Supervisely data markup tool, the ClearML experiment tracking platform, and monitoring services.
Yandex Object Storage is used as a scalable cloud storage to store data for training and testing. In total, about 20 data buckets are used, taking up 210 terabytes.
Managed services were used to store operational data:
- Yandex Managed Service for PostgreSQL
- Yandex Managed Service for MongoDB
- Yandex Managed Service for Apache Kafka
For backend deployment — Yandex Managed Service for Kubernetes.
The entire move to the cloud, which the project team carried out independently with the involvement of the Yandex Cloud support team, took just a few days.
The company has left a local office data warehouse on its own server, which is also duplicated in the cloud. In addition, four local servers with GPU (4 + 1 + 1 + 1) are used for training. Some of the regional solutions are also deployed locally in accordance with customer requirements.
Results
Celsus is now operating commercially. Thanks to their migration to Yandex Cloud, the deployment speed of systems and internal services has increased — now it takes just one day. In addition, it was possible to improve the quality of services provided thanks to the services of the Yandex Cloud ecosystem and automatically scale the infrastructure as needed.
The company’s specialists have mastered new tools and technologies like Yandex Managed Service for Kubernetes to deploy and run the system in a production environment, as well as uploading metrics to Grafana using Yandex Managed Service for Prometheus for data monitoring and visualization.
Cloud services are used by all major divisions of the company, including the ML team, backend developers, and DevOps engineers.
The growth of the company, revenue, and the number of processed scans was accompanied by a significant increase in the company’s presence in the cloud, both at the development stage and, most of all, at the inference stage.
With rapid expansion into other Russian regions, the company was able to obtain reliable computing resources with an satisfactory response time. The Yandex Cloud platform uses secure communication channels, and all cloud services are certified in accordance with FSTEC requirements. Using Yandex Cloud made it possible to comply with Russian legislative requirements regarding the storage of personal medical data.
In the new environment, the company has become better at controlling budgets as it is now easier to estimate costs, unit economy, and resource consumption.
Opinion
Yandex Cloud allows us to respond almost instantly to any changes in our business: expansion to a new region, development and testing of a new product, or a significant increase in scans from a large customer. At the start of each project, we make every effort to persuade the customer to deploy to the cloud — this is much more convenient and allows us to maintain maximum product quality and offer the fastest updates.