Abstract
we studied an ensemble-based deep reinforcement learning approach to portfolio management in the Russian stock market. The study aimed to reproduce a baseline ensemble architecture using Russian equity market data, evaluate its transferability, and identify modifications that improve trading performance over a long investment horizon. We implemented an initial model consisting of three decision-making agents and then extended the analysis by incorporating a broader set of features, macroeconomic indicators, risk-adjusted reward functions, and a mechanism for continuous adaptation of one agent during live trading. We trained and evaluated the models on data from 2015 to 2025 for liquid Russian equities, and we conducted the final performance comparison on the out-of-sample period from 2023 to 2025. The results show that the main driver of performance improvement is continuous off-policy training, where one agent is updated using trading data generated by any active agent in the ensemble. The best-performing configuration achieves a cumulative return of 61.7 percent and outperforms passive benchmark strategies in absolute return. However, the results also reveal a structural limitation: a long-only ensemble without an explicit allocation mechanism to low-risk assets does not provide sufficient capital protection during prolonged bear market conditions combined with high interest rates.

