Scientists at Smart Engines have developed a neural network capable of recognizing the handwritten word "chinchilla" without using linguistic context. This achievement solves a key problem of AI "hallucinations" — situations where the system replaces rare or complex words with more common ones. The technology has already been implemented in Russian passport recognition systems used by major banks and telecom operators.
As the company explained, the word "chinchilla" has become a kind of Turing test for Cyrillic OCR systems: with sloppy handwriting, the letters "ш", "и", "л" visually merge.
The "Da Vinci" neural network is trained on 1.2 million lines of synthesized handwritten text with a uniform distribution of letters. This avoids dependence on language patterns.
According to Vladimir Arlazarov, CEO of Smart Engines, when processing documents, it is critical to read exactly what is written, and not "correct" errors based on context.
Smart Engines technology differs from traditional OCR solutions, such as ABBYY FineReader or Tesseract, which often rely on language models. For example, foreign counterparts may replace rare words with similar ones in context, which is unacceptable in legal documents. The development of Russian scientists is especially relevant for processing proper names, numbers and official seals, where accuracy is paramount.
Read more materials on the topic:
Smart Engines introduced a service for recognizing and verifying documents — Smart ID Engine 2.5