特征–奖励对齐的深度强化学习架构设计

火久元; 黄腾飞

特征–奖励对齐的深度强化学习架构设计

Architecture Design of Deep Reinforcement Learning with Feature–Reward Alignment

摘要

摘要: 针对深度强化学习在多智能体环境中普遍存在的特征与奖励机制难以匹配、从而导致算法有效性与适用性不足的问题，提出了一种架构–特征–奖励协同设计框架（AFRD），用于系统性地指导单智能体方法向多智能体场景扩展。该框架依托CTDE（centralized training with decentralized execution），在特征层面引入关键的本地与全局信息，在奖励层面对齐个体目标与系统整体目标，从而形成具有可迁移性的设计思路。接着以边缘计算任务卸载为应用背景，基于AFRD框架在PPO算法上实现了AFRD-PPO，并在三种典型卸载模式下开展实验，对比不同特征与奖励机制组合的收敛性能表现，并进一步分析其对收敛平稳性的影响。实验结果表明，AFRD框架能够有效提升深度强化学习在多智能体环境中的收敛稳定性与适用性。研究为相关领域的研究与应用提供了有益的参考与借鉴。

Abstract: To address the common mismatch between feature and reward mechanisms in deep reinforcement learning （DRL） under multi-agent environments—which often leads to limited effectiveness and applicability—this paper proposes an Architecture–Feature–Reward Design （AFRD） framework to systematically guide the extension of single-agent methods to multi-agent scenarios. The framework is built upon the CTDE paradigm, where key local and global information is incorporated at the feature level, and individual objectives are aligned with system-wide goals at the reward level, thereby forming a transferable design approach. Using task offloading in edge computing as an application case, we implement AFRD-PPO by applying the AFRD framework to the PPO algorithm, and conduct experiments under three typical offloading modes to compare the convergence performance of different feature–reward combinations, further analyzing their impact on convergence stability. Experimental results demonstrate that the AFRD framework can effectively enhance the convergence stability and applicability of DRL in multi-agent environments. This study provides useful insights and references for future research and applications in related domains.

HTML全文

参考文献(25)

施引文献

资源附件(0)