European Scientists Accelerate AI Recommendation Training by 60x Thanks to Yandex's Yambda Dataset

Researchers from the University of Amsterdam have reported a significant acceleration in the training of recommendation systems — almost 60 times faster. The open dataset Yambda, published by Yandex in 2025, played a key role in the experiment. It contains about 5 billion anonymized user events from the Yandex Music service and is considered one of the largest public datasets for recommendation tasks.

The work was carried out on the Seater model, which forms a hierarchical catalog of content based on a tree-like structure. This approach increases the accuracy of recommendations, but the catalog preparation stage previously took up to 20% of the total training time.

The scientists proposed two optimization methods. The first focuses on maximizing processing time reduction, while the second combines accelerated preparation with additional structure refinement. On Yambda data, the basic method reduced the preprocessing time from 82 minutes to 83 seconds without loss of quality. The combined version provided a 15-fold acceleration and showed an increase in accuracy.

According to the test results, Seater outperformed the SASRec, BERT4Rec, and GRU4Rec models by 13–17%. The developers note that the scale of Yambda made it possible to confirm the applicability of generative recommendation systems on large catalogs. The source code of the updated version of Seater is published in open access.

Read more materials on the topic:

Комментарии