Abstract
we studied the image analysis capabilities of two widely used neural network services: ChatGPT-5 mini and DeepSeek-3.1 Thinking. We measured the quality of feature generation and analogy matching using a new methodology and a unique experimental framework that employed all four training examples for each of two classes. In experiments with 93 proposed sounds and automatically generated Modified Bongard Tests, ChatGPT-5 mini completed 15 (16.1%) tests, and DeepSeek-3.1 Thinking completed 17 (18.3%). These results demonstrate that, despite clear progress in few-shot learning, current multimodal neural network transformers still face fundamental limitations in contextual learning.
References
GPT-5 is here – OpenAI. Режим доступа: https://openai.com/gpt-5.
DeepSeek. Режим доступа: https://www.deepseek.com.
Face Recognition Grand Challenge (FRGC). Режим доступа: https://www.nist.gov/programs-projects/face-recognition-grand-challenge-frgc.
ImageNet Large Scale Visual Recognition Challenge (ILSVRC). Режим доступа: https://image-net.org/challenges/LSVRC/index.php.
Radford А. et al. Learning Transferable Visual Models from Natural Language Supervision. International Conference on Machine Learning. 2021:8748-8763. DOI: https://doi.org/10.48550/arXiv.2103.00020.
Бонгард М. М. Проблема узнавания. М.: Физматгиз; 1967. 320 с.
Hofstadter D. R. Gödel, Escher, Bach: an Eternal Golden Braid. Basic books; 1999.
Nie W. et al. Bongard-LOGO: A New Benchmark for Human-Level Concept Learning and Reasoning. Advances in Neural Information Processing Systems. 2020;33:16468-16480. Режим доступа: https://proceedings.neurips.cc/paper_files/paper/2020/file/bf15e9bbff22c7719020f9df4badc20a-Paper.pdf.
Index of Bongard Problems. Режим доступа: https://www.foundalis.com/res/bps/bpidx.htm.
Małkiński M., Pawlonka S., Mańdziuk J. Reasoning Limitations of Multimodal Large Language Models. A Case Study of Bongard Problems. 2024. arXiv:2411.01173. Режим доступа: https://arxiv.org/abs/2411.01173.
IQ Test. Режим доступа: https://www.mensa.org/mensa-iq-challenge/#test.
Tracking AI. Monitoring Artificial Intelligence. Режим доступа: https://www.trackingai.org/home.
Chollet F. On the Measure of Intelligence. 2019. arXiv:1911.01547. Режим доступа: https://arxiv.org/pdf/1911.01547.
Chollet F. How We Get To AGI. 2025. Режим доступа: https://www.youtube.com/watch?v=5QcCeSsNRks.
ARC Prize 2024: Technical Report. 2024. Режим доступа: https://arcprize.org/competitions/2024/.
Akyürek E. et al. The Surprising Effectiveness of Test-Time Training for Few-Shot Learning. 2024. arXiv:2411.07279. Режим доступа: https://arxiv.org/html/2411.07279v2.
ARC Prize 2024. Режим доступа: https://arcprize.org/competitions/2024/.
База данных 93 изображений тестов МТБ 2025. Режим доступа: https://disk.yandex.ru/d/SDvvt4xqDh49ZQ.
Мясников В. В. и др. Методы обнаружения и распознавания объектов на цифровых изображениях. Самара: Изд-во СГАУ; 2006. 168 c. Режим доступа: https://repo.ssau.ru/handle/Uchebnye-posobiya/Metody-obnaruzheniya-i-raspoznavaniya-obektov-na-cifrovyh-izobrazheniyah-Elektronnyi-resurs-uchebposobie-54225.
Copilot 3D Transforms an Image into a Usable 3D Model. Режим доступа: https://copilot.microsoft.com/labs/experiments/copilot-3d.
