Abstract:
To address the common mismatch between feature and reward mechanisms in deep reinforcement learning (DRL) under multi-agent environments—which often leads to limited effectiveness and applicability—this paper proposes an Architecture–Feature–Reward Design (AFRD) framework to systematically guide the extension of single-agent methods to multi-agent scenarios. The framework is built upon the CTDE paradigm, where key local and global information is incorporated at the feature level, and individual objectives are aligned with system-wide goals at the reward level, thereby forming a transferable design approach. Using task offloading in edge computing as an application case, we implement AFRD-PPO by applying the AFRD framework to the PPO algorithm, and conduct experiments under three typical offloading modes to compare the convergence performance of different feature–reward combinations, further analyzing their impact on convergence stability. Experimental results demonstrate that the AFRD framework can effectively enhance the convergence stability and applicability of DRL in multi-agent environments. This study provides useful insights and references for future research and applications in related domains.