Neural network and X-ray spectra: How to speed up catalyst analyses

A joint project of the Yandex Cloud Center for Technologies and Society and the Southern Federal University helps scientists to study the atomic structure of catalysts.

What catalysts are and what they do

New diagnostic methods for catalysts are being developed at Southern Federal University’s International Research Institute of Intelligent Materials.

Catalysts are substances that accelerate chemical reactions while not being consumed themselves. Almost every industrial process has its own catalyst, e.g., the Phillips catalyst for the synthesis of polyethylene, the Ziegler–Natta catalyst for the production of polypropylene, rhodium catalysts for the production of alcohols in hydroformylation, and platinum catalysts for hydrogen energy.

Despite catalysts’ enormous role in our modern technological civilization, very little was known about the processes that occur with them at the atomic level for a long time. To understand how the active centers of catalysts are arranged, why they degrade, and how to improve their activity, scientists use X-ray absorption spectroscopy, one of the most accurate non-destructive methods for diagnosing atomic structure.

Experiments are carried out on synchrotrons, megaclass installations that literally “peek” inside a substance during its operation, e.g., at high temperatures and under gaseous pressure.

Full screen image

One of the experimental stations of the Kurchatov synchrotron radiation source. Here, catalysts are investigated under conditions as close as possible to real ones

Full screen image

Here’s the scheme of such an experiment: active rhodium centers are formed in a capillary under pressure and heating. Their spectra are recorded using X-rays, and then mathematically processed to further decipher the structure

It includes Bogdan Protsenko, Mikhail Lifar, Georgy Asaturov, Daniil Kupriyanenko, Nazar Chubkov, Kirill Kulaev, Georgy Kochiev, Alexander Guda, and Sergey Guda. The team works under the guidance of Professor Alexander Soldatov, scientific director of the department at Southern Federal University.

Llama was created by Meta. Meta is designated as an extremist organization and its activities are prohibited in Russia.

Interpreting the resulting spectra is a difficult task. Even with instructions and the right direction, the analysis can take several hours. It may take a year from the moment of measurement to the publication of the results, even with the participation of highly-qualified quantum chemistry, spectroscopy and computer modeling specialists.

Which solution was implemented

To simplify and speed up spectrum analysis, the interdisciplinary team of the SFU International Research Institute of Intelligent Materials has developed a smart research agent. It is based on a large language model and trained on specially assembled databases using the author’s PyFitIt framework.

The architecture of the solution does not depend on a specific LLM, as the agent can work with different models. Neural networks with a large number of parameters, like LLaMa 3.2 (40B) and DeepSeek R1: they provide higher accuracy of logical reasoning and allow you to form informed and meaningful conclusions.

Full screen image

Communication with the agent takes place via a Telegram bot. The user uploads a document with a spectrum, i.e., a plain text file with data. The system takes it and starts the analysis: generates the code, expands the Docker® container, and performs calculations on the spectroscopy framework. In response, the user receives a result based on physical and chemical patterns: the charge of atoms, distances between them, and coordination numbers.

The telegram bot supports two interaction scenarios:

  • Pipeline mode is a linear operation with hints, suitable for novice users.
  • Free mode, in which the user sets a prompt in free form, and the agent determines the approach to the analysis independently.
Full screen image

How users interact with the smart agent in free mode

A term describing the analysis of materials or devices in the process of their operation, i.e., in real-world operating conditions, and not in laboratory, artificially created conditions.

Everything, from the request to the response, takes less than a minute. The accuracy of the analysis is not just comparable to the results obtained manually, but in many cases it is even higher. This is especially noticeable in scenarios where pre-prepared templates or examples are not used, while error probability remains below 10%.

Why they chose the cloud

Southern Federal University has its own CPU-based computing cluster. Expanding the infrastructure to work with large GPU-based language models would require significant investment, so the team implemented LLM training and deployment in the cloud, based on Yandex Cloud and using Yandex DataSphere. The service offers all the tools needed and dynamically scalable cloud resources for a full cycle of machine learning development.

Results and future plans

The agent they developed made it possible to automate spectrum analysis and reduce data processing times from several months to one minute. Key value prediction errors do not exceed:

  • 0.2 units: for atomic charge (the degree of oxidation, reflecting how many electrons a metal atom has given up in the active core)
  • 0.3 units: by coordination numbers (the number of nearby neighbor atoms)
  • 0.01 Å: for distances between atoms (the average distance from the central atom to surrounding ones).

Taken together, these indicators allow chemists carrying out research to understand the structure and behavior of the active core of the catalyst, as well as the signs of its degradation.

The team plans to distribute and adapt the agent for use at large Russian installations, including the Kurchatov Synchrotron, as part of a project to develop a smart operando diagnostics station. The goal is to create a convenient and reliable tool that will become the gold standard of data processing for users of such platforms. Spectral analysis tools are being developed with the support of Southern Federal University’s Priority 2030 program.

The Yandex Cloud Center for Technologies and Society implements socially significant projects in the fields of education, science, healthcare, environment, and culture. If you have a similar project, please fill out an application.

Neural network and X-ray spectra: How to speed up catalyst analyses
Sign in to save this post