It will be easier to find ancestors: Alice AI VLM upgraded "Archive Search"

The new model recognizes people's roles, events, and family ties in old documents

"Yandex" has updated its "Archive Search" service: now it not only recognizes text in historical documents but also understands the structure of the record. The new model can identify participants in an event, determine their roles, and the relationships between people.

Previously, users had to search for the right person among all mentions in an archive file – along with official notes, dates, and other names. After the update, the search became more accurate: you can specify not only the full name but also the context of the event. For example, when searching for a birth record, you can specify the roles "born," "father," or "mother," and when working with a marriage certificate – "groom," "bride," or "witness."

The update is based on Yandex's multimodal model Alice AI VLM. It works with text and images and has a good understanding of the Russian language, which is especially important for archival documents. Thanks to this, developers did not have to collect huge training samples: the model was able to master data extraction in a small number of iterations on specialized markup.

The quality of the system was evaluated by the proportion of people that could be found by full name in the archive search. The average accuracy was 90.5%. For birth records, the figure reached 92.7%, for marriage documents – 89.7%, and for death records – 87.2%.

"Archive Search" helps find mentions of people, settlements, and events in handwritten documents from the 18th–20th centuries, which are deciphered by a neural network. The service's database already contains over 20 million pages of historical documents from the archives of Moscow, Moscow, Orenburg, Vologda, Irkutsk, Astrakhan, and other regions. In addition, the service searches for data in more than 200 pre-revolutionary and Soviet newspapers, as well as in directories.

Read more on the topic:

Now on home