Researchers from "T-Tekhnologii" Found a Way to Make AI Less Prone to Agreement

Researchers from the "T-Tekhnologii" Research and Development Center have created a two-stage test to assess the tendency of large language models to agree with the user. As reported to TASS by the company's press service, the first stage measures how much the evaluation of a ready-made solution changes when moving from a neutral to a negative context. The second checks the neural network's ability to find logical contradictions in the problem statement itself — and to refuse to solve it, rather than adjusting the answer.

Experiments on GPT, DeepSeek, Qwen, Claude Sonnet, and Gemini family models showed that artificial intelligence systems are prone to agreement in 23–50% of cases when solving logical problems. Additional training on user preferences in many situations did not correct the situation but exacerbated it — the model more often accepted an incorrect assessment or erroneous formulation. Stanislav Moiseev, head of the Center, noted that in tasks requiring strict reasoning, it is not enough for AI to give a convincing answer — at some point, it is necessary to disagree with the user.

The researchers proposed a way to correct this effect through modifications in the models' structure. This opens up the possibility of increasing the reliability of neural networks in critical scenarios — from software code verification to mathematical analysis. Reducing "yes-saying" makes AI not just a polite interlocutor, but a system capable of defending logic despite an erroneous request.

Read more on the topic:

Комментарии