Abstract
we reviewed the architectural principles and algorithms of modern generative neural networks and large language models. We examined the core mechanisms of text processing, including tokenization, embeddings, deep contextualization, autoregressive generation, and context management. We analyzed the transformer architecture in detail and emphasized self-attention as the key innovation that enables parallel sequence processing and captures long-range dependencies efficiently. We described common architecture variants (encoder-only, decoder-only, and encoder–decoder), positional encoding methods, and the computational complexity of major components.
We studied pretraining on large text corpora and subsequent adaptation through fine-tuning and instruction tuning. We examined alignment with human preferences using reinforcement learning from human feedback (RLHF) and reviewed efficient adaptation methods, including low-rank adaptation (LoRA) and quantization. We introduced the concept of an object information context, which we define as the overall amount and distribution of information about a given object in digital data, including all references to that object in the training corpus. We showed that the structure of this context in the training data influences the formation of object representations in the model’s embedding space. We also examined the quality of the data environment and showed that low-quality sources degrade model accuracy, and we discussed approaches to mitigate these effects.
References
Vaswani A., Shazeer N., Parmar N. et al. Attention is All You Need. 2017. arXiv:1706.03762. DOI: 10.48550/arXiv.1706.03762.
Bahdanau D., Cho K., Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate. 2016. arXiv:1409.0473.
Radford A., Narasimhan K., Salimans T., Sutskever I. Improving Language Understanding by Generative Pre-Training. 2018. Режим доступа: https://cdn.openai.com/research-covers/languageunsupervised/language_understanding_paper.pdf.
Radford A., Wu J., Child R. et al. Language Models are Unsupervised Multitask Learners. 2019. Режим доступа: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.
Brown T. B., Mann B., Ryder N. et al. Language Models are Few-Shot Learners. 2020. arXiv:2005.14165.
GPT-4 Technical Report. arXiv:2303.08774. DOI: 10.48550/arXiv.2303.08774.
Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018. arXiv:1810.04805. DOI: 10.18653/v1/N19-1423.
Raffel C., Shazeer N., Roberts A. et al. Exploring the Limits of Transfer Learning with a Unified Textto-Text Transformer. Journal of Machine Learning Research. 2020;21:1–67. arXiv:1910.10683. DOI: 10.48550/arXiv.1910.10683.
Touvron H., Lavril T., Izacard G. et al. LLaMA: Open and Efficient Foundation Language Models. 2023. arXiv:2302.13971. DOI: 10.48550/arXiv.2302.13971.
Jiang A. Q., Sablayrolles A., Mensch A. et al. Mistral 7B. 2023. arXiv:2310.06825. DOI: 10.48550/arXiv.2310.06825.
Kaplan J., McCandlish S., Henighan T. et al. Scaling Laws for Neural Language Models. 2020. arXiv:2001.08361. DOI: 10.48550/arXiv.2001.08361.
Hoffmann J., Borgeaud S., Mensch A. et al. Training Compute-Optimal Large Language Models. Advances in Neural Information Processing Systems (NeurIPS). 2022;35:30016-30030. arXiv:2203.15556.
Ouyang L., Wu J., Jiang X. et al. Training Language Models to Follow Instructions with Human Feedback. Advances in Neural Information Processing Systems (NeurIPS). 2022;35:27730–27744. arXiv:2203.02155.
Wei J., Bosma M., Zhao V. Y. et al. Finetuned Language Models Are Zero-Shot Learners. International Conference on Learning Representations (ICLR). 2022. arXiv:2109.01652.
Chung H. W., Hou L., Longpre S. et al. Scaling Instruction-Finetuned Language Models. 2022. arXiv:2210.11416. DOI: 10.48550/arXiv.2210.11416.
Bai Y. et al. Constitutional AI: Harmlessness from AI Feedback. 2022. arXiv:2212.08073. DOI: 10.48550/arXiv.2212.08073.
Hu E. J., Shen Y., Wallis P. et al. LoRA: Low-Rank Adaptation of Large Language Models. International Conference on Learning Representations (ICLR). 2023. arXiv:2106.09685. DOI: 10.48550/arXiv.2106.09685.
Dettmers T., Lewis M., Belkada Y., Zettlemoyer L. LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale. Advances in Neural Information Processing Systems (NeurIPS). 2022;35:30318– 30332. arXiv:2208.07339.
Shazeer N., Mirhoseini A., Maziarz K. et al. Outrageously Large Neural Networks: The SparselyGated Mixture-of-Experts Layer. International Conference on Learning Representations (ICLR). 2017. arXiv:1701.06538.
Wei J., Wang X., Schuurmans D. et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Advances in Neural Information Processing Systems (NeurIPS). 2022;35:24824– 24837. arXiv:2201.11903.
Yao S., Yu D., Zhao J. et al. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. 2023. arXiv:2305.10601. DOI: 10.48550/arXiv.2305.10601.
Бондарева Н. А., Бондарев A. E., Андреев С. В., Рыжова И. Г. Информационная плотность объектов в цифровой среде: теоретические основы. Научная визуализация. 2025;17(4):87–98. DOI: 10.26583/sv.17.4.09.
Bommasani R., Hudson D. A., Adeli E. et al. On the Opportunities and Risks of Foundation Models. 2021. arXiv:2108.07258. DOI: 10.48550/arXiv.2108.07258.
Goodfellow I., Bengio Y., Courville A. Deep Learning. MIT Press; 2016. ISBN: 9780262035613.
McEnery T., Hardie A. Corpus Linguistics: Method, Theory and Practice. Cambridge University Press; 2011. ISBN: 9780521761942.
Liu B. Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. Cambridge University Press; 2015. ISBN: 9781107017894.
Lewis P., Perez E., Piktus A. et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems (NeurIPS). 2020;33:9459-9474. arXiv:2005.11401.

