Ансамблевый метод глубокого обучения с подкреплением для управления инвестиционным портфелем на российском фондовом рынке

A. A. Kobzev; O. N. Krakhmalev

Vol 7 No 2 (2026), Papers

Vol 7 No 2 (2026)

Ensemble Deep Reinforcement Learning Approach for Portfolio Management in the Russian Equity Market

Papers

Published June 30, 2026

A. A. Kobzev^∗⁻
O. N. Krakhmalev^∗⁻

A. A. Kobzev

Financial University under the Government of the Russian Federation, Moscow, Russian Federation

https://orcid.org/0009-0009-2369-1741

O. N. Krakhmalev

Financial University under the Government of the Russian Federation, Moscow, Russian Federation

https://orcid.org/0000-0002-9388-4137

PDF (Russian)

Keywords

deep reinforcement learning
portfolio management
ensemble trading strategies
continuous adaptation
risk-aware optimization
Russian stock market

How to Cite

1.

Kobzev A.A., Krakhmalev O.N. Ensemble Deep Reinforcement Learning Approach for Portfolio Management in the Russian Equity Market // Russian Journal of Cybernetics. 2026. Vol. 7, № 2. P. 119-125.

Abstract

we studied an ensemble-based deep reinforcement learning approach to portfolio management in the Russian stock market. The study aimed to reproduce a baseline ensemble architecture using Russian equity market data, evaluate its transferability, and identify modifications that improve trading performance over a long investment horizon. We implemented an initial model consisting of three decision-making agents and then extended the analysis by incorporating a broader set of features, macroeconomic indicators, risk-adjusted reward functions, and a mechanism for continuous adaptation of one agent during live trading. We trained and evaluated the models on data from 2015 to 2025 for liquid Russian equities, and we conducted the final performance comparison on the out-of-sample period from 2023 to 2025. The results show that the main driver of performance improvement is continuous off-policy training, where one agent is updated using trading data generated by any active agent in the ensemble. The best-performing configuration achieves a cumulative return of 61.7 percent and outperforms passive benchmark strategies in absolute return. However, the results also reveal a structural limitation: a long-only ensemble without an explicit allocation mechanism to low-risk assets does not provide sufficient capital protection during prolonged bear market conditions combined with high interest rates.

PDF (Russian)

References

Markowitz H. M. Portfolio Selection. The Journal of Finance. 1952;7(1):77–91. DOI: 10.1111/j.1540-6261.1952.tb01525.x.

Black F., Litterman R. Global Portfolio Optimization. Financial Analysts Journal. 1992;48(5):28–43. DOI: 10.2469/faj.v48.n5.28.

Yang Z., Zhu Y., Guo J., Liu X.-Y., Zhong S., Walid A. Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy. ICAIF ’20. 2020:1–8. DOI: 10.1145/3383455.3422540.

Mnih V., Kavukcuoglu K., Silver D. et al. Human-Level Control through Deep Reinforcement Learning. Nature. 2015;518:529–533. DOI: 10.1038/nature14236.

Lillicrap T. P., Hunt J. J., Pritzel A. et al. Continuous Control with Deep Reinforcement Learning. arXiv:1509.02971. 2016. DOI: 10.48550/arXiv.1509.02971.

Schulman J., Wolski F., Dhariwal P., Radford A., Klimov O. Proximal Policy Optimization Algorithms. arXiv:1707.06347. 2017. DOI: 10.48550/arXiv.1707.06347.

Haarnoja T., Zhou A., Abbeel P., Levine S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. Proceedings of ICML. 2018:1861–1870.

Liu S., Rui J., Gao J. et al. FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning. 36th Conference on Neural Information Processing Systems (NeurIPS 2022).

Theate T., Ernst D. An Application of Deep Reinforcement Learning to Algorithmic Trading. Expert Systems with Applications. 2021;173:114632. DOI: 10.1016/j.eswa.2021.114632.

Khetarpal K., Riemer M., Rish I., Precup D. Towards Continual Reinforcement Learning: A Review and Perspectives. Journal of Artificial Intelligence Research. 2022;75:1401–1476. DOI: 10.1613/jair.1.13673.

Artzner P., Delbaen F., Eber J.-M., Heath D. Coherent Measures of Risk. Mathematical Finance. 1999;9(3):203–228. DOI: 10.1111/1467-9965.00068.

Rockafellar R. T., Uryasev S. Conditional Value-at-Risk for General Loss Distributions. Journal of Banking & Finance. 2002;26(7):1443–1471. DOI: 10.1016/S0378-4266(02)00271-6.

Downloads