How Yandex Cloud technologies help find new treatment options

Here is how the Sechenov University team developed a medical data platform using Yandex Cloud services. Now doctors and interns can search for information and collect stats faster, exporting data in a matter of seconds.

Over the last decade, keeping medical records in digital form has become a healthcare standard in Russia. Medical information systems store details about diseases, tests, and lab results, making it easier for doctors to diagnose patients.

However, medical information systems currently do not allow exporting data on patients with similar symptoms, so collecting statistics is a challenge. Yet, such data is a valuable resource for scientific research. For example, this information can help identify the most effective treatments for new flu strains during seasonal outbreaks. Apart from that, having access to statistics would accelerate research: postgraduate students and interns could quickly get information about similar symptoms rather than spend hours searching through archives.

How to set up search across 18,000,000 documents

Sechenov University has multiple affiliated clinical hospitals treating more than 20,000 patients annually. Since 2012, their medical information systems have stored over 18,000,000 records.

The patient data is kept in an extensive documented database powered by 1C: Medicine. Still, the main purpose of the medical information system is diagnosing and treating patients. This led to problems with structuring the data for search and exporting standardized datasets. To speed up searching and filtering data on patients, it was essential to convert the records into a new format and provide update support. To run the relevant project, Sechenov University specialists teamed up with Yandex Cloud and Beltel Datanomics, a company that specializes in AI and big data analytics.

Cloud platform for storing clinical research data

The medical information storage system is a large documented database built on 1С: Medicine. It holds about 300 GB of data, including references, discharge summaries, and surgical and childbirth records. The new platform uses 1C MIS as the primary data source for uploading medical documents to the cloud.

X-ray scans take up the most space. The system stores them as links, keeping the original images on the radiology information system server. With this approach, large files do not affect the platform performance.

“The server updates every night. Reference data gets updated daily, and patient documents enter the system after discharge as anonymized data. This way, users can only view clinical case information, with no way to identify patients. The 1C server anonymizes the data before sending it to the cloud.

The system employs UserGate to secure data during storage and transfer. It meets the Russian Federation’s information security standards and regulations. Yandex Cloud data storage complies with the security requirements of the Russian Federal Law on Personal Data. Users sign in either through the Sechenov ID server or with their login credentials”.

The web interface supports filtering by 120 parameters to make searches easier. For example, a doctor or intern can use it to locate patients diagnosed with Covid, aged 50 to 55, who also have diabetes, and then review their lab results or X-ray images.

Example of a saved dataset

The platform shows search results as a table. Users can save this table as a template to re-use it later and change the parameters as needed. They can also study each clinical case in detail and review related X-ray scans, ultrasound images, medical opinions, and more.

How the solution works

The university’s platform uses more than 10 Yandex Cloud services. Currently, it keeps processed and anonymized data in a storage based on Managed Service for PostgreSQL. With increasing load, the university is considering migration to Managed Service for Greenplum®.

The platform uses Managed Service for OpenSearch for efficient data searching. To prevent data loss or accidental deletion, the system creates backups and stores them in Object Storage.

All user requests go through Security Groups, a tool that manages incoming traffic and prevents unauthorized access. The platform employs Grafana to monitor application performance metrics and relies on Managed Service for GitLab to store source code and handle automated app deployment.

“Clinical trials” for cloud platform

Handling a large amount of unstructured data became the main challenge in developing the data platform. To solve this, Yandex Cloud and Beltel Datanomics experts created a data specification and implemented validation and typing in the cloud.

They also introduced iterative optimization for the cloud storage, where developers would incrementally improve indexing and queries to enhance service operation. They converted unstructured data, such as hospitalization records, into simple related tables.

When creating datasets, doctors can enter any query they want. To improve query processing and help the system understand exactly what the user needs, the platform developers integrated GraphQL and RESTful. This enabled precise data retrieval from storage.

With over 120 search parameters and 88 lab tests including 1,700 lab parameters, the team focused more on detailed UI/UX design. Moving forward, they are going to group these parameters so as to make filtering even faster.

10 seconds rather than months of searching through archives

While testing the platform, the Sechenov University staff intentionally stressed the system by creating a dataset with many parameters (a 156 MB CSV file). The result was impressive: the platform returned data in 10 to 12 seconds. One would need months to collect that much data manually.

Of course, this does not mean that every scientific task will be solved in 12 seconds. Researchers need to thoroughly develop the study topic and define the parameters. After exporting the dataset, the team filters, refines, and labels the data.

Medical data platform: Development plans

Currently, the system works in test mode. It is used by 30 staff members, including department heads and associate professors engaged in research and treatment. Previously, they built databases almost manually. The staff have already completed training and learned how to create and use datasets. Their primary task now is to run pilot testing, find issues, and suggest improvements.

The team also plans to increase the number of users to 1,000 by involving professors, researchers, postgrads, and interns. To simplify handling large tables, developers will introduce mathematical data processing. This will help calculate median values and visualize data with graphs in DataLens.

Moving forward, Sechenov University wants to invite other universities to join the system and merge their databases. Such cooperation is going to be a win-win opportunity. Since Sechenov University’s hospital has the highest patient count among educational institutions, smaller universities will gain access to a large database to improve diagnosis and treatment, while connecting to more databases will help Sechenov University expand its own with certain uncommon cases.

The road-map for 2024 also includes building a portal that hosts collected datasets, each with a brief clinical description, annotation, patient counts, and metrics. These datasets will be available to scientific teams and AI developers, such as healthtech companies creating new ML-based diagnostic and treatment devices.

author
Anna Plemyashova
Head of Beltel Datanomics

Contact us

Get started with Yandex Cloud

Pricing

Check out our service plans and calculate the cost
How Yandex Cloud technologies help find new treatment options
Sign in to save this post