Background
The Higher School of Economics (HSE University) is one of Russia’s leading universities that offers over 750 degree and short-term programs on its website.
Its staff have developed a pipeline helping the university website’s users to select online and short-term education programs based on personal preferences and other data. Using YandexGPT API and Yandex SpeechKit, the project team created an AI bot and integrated it into the HSE website.
Creating a solution to select programs for prospective students
The Higher School of Economics (HSE University) is one of Russia’s largest universities that holds top positions in education rankings. It is a comprehensive university that provides education in economics, engineering, humanities, social sciences, and other domains.
To make it easier for prospective students to select one of its hundreds of education programs, the university decided to equip its website with a solution that would help applicants to find a program based on their personal preferences.
The staff at HSE have developed a unified extensive catalog of all education programs and courses, complete with filters, a site-wide menu, and a number of calculators. Beyond that, however, the university needed a solution that would enable website users to ask highly diverse questions. The project team opted for creating an assistant that would recommend a specific education program to an applicant based on what they have told (through text or voice messages) about their preferences and interests. After considering a number of options, the HSE team eventually started developing an LLM chatbot to incorporate into the website.
In their search for a suitable LLM, the developers at HSE had various criteria in mind. They factored in what position GPT models had in rankings and sought an open-source solution, either based in Russia or free and based in another country. The project was geared towards not only developing a chatbot but also a more global ambition: to promote generative AI technologies at the HSE University.
Eventually, the team picked out three LLMs and load-tested them. The tests revealed that YandexGPT was the fastest and most accurate, making it the choice for creating the HSE chatbot.
Education program finder for prospective students based on YandexGPT API and Yandex SpeechKit
To implement this solution, HSE extracted and analyzed content from education program websites using the the BeautifulSoup open-source library. This process included removing HTML/XML tags from the data as well as defining all parent and child elements for each object.
The team then split all content into 1024-character chunks with a 200-character overlap, converted it into embeddings using Yandex Foundation Models Embedding API, and put it together into a ClickHouse® vector database hosted on their own HSE server. The DB is around 500 MB large. Whenever teachers update the existing education programs or add new ones, the ClickHouse® database gets updates as well.
The staff developed the program finder in Python, embedded it into the HSE university website, and created a pipeline ensuring that text queries are processed by YandexGPT and voice messages, by Yandex SpeechKit.
The chatbot is highly user-friendly. A website visitor asks a question as a text or voice message in which they specify their interests, achievements, age, intended studying location, final exam score, and any other information they deem fit. If the user asks a question using their voice, Yandex SpeechKit converts it into text. The text query then goes to YandexGPT API, which highlights key data for getting a precise answer. After that, Yandex Foundation Models Embedding API vectorizes the query. Prompting queries are around 40,000 characters long at both input and output. The team then employs retrieval augmented generation (RAG), which means using the vectorized query to search the vector database for the closest matches about the education programs best fitting the user. All found chunks, along with the user query and system prompt, go to YandexGPT, which, in turn, generates an answer to the query based on all information from the chunks.
At first, the developers decided to provide users with information about the suitable programs along with an additional description of those programs by YandexGPT. However, some issues arose at this stage. The team has not yet managed to train the network to correctly highlight key points in the answers, with minor features persistently coming to the fore. Therefore, the staff settled on only providing program details from the knowledge base.
The HSE team spent around two months working on the project, and much of that time went into creating design layouts and fine-tuning the user interface aspects. Engineers at the university did all work on their own, with support from Yandex Cloud experts.
The bulk of the project’s difficulties was due to a lack of experience with generative AI. After a while, the developers realized it was insufficient to just have YandexGPT API receive questions, so they pivoted to a pipeline whereby answers are based on information from a vector database. Following that, the development went much faster.
70% positive reviews from chatbot users
The HSE team developed and launched a chatbot helping prospective students to find the education program most suitable for them.
Using RAG, they created a pipeline helping users to select an education product. Currently, the new tool only enables users to find online and short-term programs. However, the project team intends to enlarge the recommendation range and supplement it with bachelor’s and master’s programs, which are, as of now, missing in the database.
At present, around 5% of the HSE website’s visitors who land on the pages with the chatbot widget available use it. The team tracked user actions and numbers over the entire summer of 2024. Over that period, each of the 300 users did three to four actions, which included asking the chatbot questions or leaving their contact details. 10% of these users did the desired actions after navigating to education program pages.
Over 70% of the chatbot’s users left positive reviews about it.
Further on, the HSE staff intend to introduce additional elements into the pipeline to make the experience more convenient. In particular, this aims at factoring in all parts and features of education programs. Apart from that, the experts intend to develop a hybrid bot that would become one of the chatbot’s components. Another such component would answer boilerplate questions that require maximum accuracy, drawing on a tree of ready-made Q&As.
The HSE University also has in view to add tier 0 support using a YandexGPT API-based chatbot. However, to implement that, the team needs to collect a database first.
Opinion
We assumed a rather systematic approach to selecting a generative AI model for our chatbot. YandexGPT proved highly efficient during tests and, given there are other Yandex Cloud services, it exceeded its counterparts in many respects.
* Photo courtesy of the HSE Public Relations Office