星际争霸多智能体挑战赛(SMAC)-程序员宅基地

技术标签: 人工智能  多智能体强化学习  

 目录

The StarCraft Multi-Agent Challenge

星际争霸多智能体挑战赛

Abstract 摘要

1 Introduction 

1 引言

2 Related Work 

2 相关工作

3 Multi-Agent Reinforcement Learning

3 多智能体强化学习

Dec-POMDPs 12-POMDPs

(十二月-POMDP)

Centralised training with decentralised execution集中式培训与分散执行

4 SMAC 

Scenarios 场景

State and Observations 

状态和观察结果

Action Space 

动作空间

Rewards 奖励

5 PyMARL 

6 Results 

6 结果

7 Conclusion and Future Work

7 结论和未来工作

Acknowledgements 

致谢

References 

引用

Appendix ASMAC 

附录 ASMAC

A.1 Scenarios

A.1 场景

A.2 Environment Setting 

A.2 环境设置

Appendix B Evaluation Methodology

附录 B 评估方法

B.1 Evaluation Metrics 

B.1 评估指标

Appendix C Experimental Setup

附录 C 实验设置

C.1 Architecture and Training

C.1 架构和训练

C.2 Reward and Observation

C.2 奖励与观察

Appendix DTable of Results

附录 DTable of results

The StarCraft Multi-Agent Challenge

星际争霸多智能体挑战赛

https://arxiv.org/abs/1902.04043

Abstract 摘要

        In the last few years, deep multi-agent reinforcement learning (RL) has become a highly active area of research. A particularly challenging class of problems in this area is partially observable, cooperative, multi-agent learning, in which teams of agents must learn to coordinate their behaviour while conditioning only on their private observations. This is an attractive research area since such problems are relevant to a large number of real-world systems and are also more amenable to evaluation than general-sum problems.
        在过去几年中,
深度多智能体强化学习(RL)已成为一个非常活跃的研究领域。该领域的一类特别具有挑战性的问题是部分可观察的、合作的、多智能体的学习,其中智能体团队必须学会协调他们的行为,同时只以他们的个体观察为条件。这是一个有吸引力的研究领域,因为这些问题与大量现实世界的系统相关,并且比一般和问题更容易评估。

        Standardised environments such as the ALE and MuJoCo have allowed single-agent RL to move beyond toy domains, such as grid worlds. However, there is no comparable benchmark for cooperative multi-agent RL. As a result, most papers in this field use one-off toy problems, making it difficult to measure real progress. In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC) as a benchmark problem to fill this gap.1 SMAC is based on the popular real-time strategy game StarCraft II and focuses on micromanagement challenges where each unit is controlled by an independent agent that must act based on local observations. We offer a diverse set of challenge scenarios and recommendations for best practices in benchmarking and evaluations. We also open-source a deep multi-agent RL learning framework including state-of-the-art algorithms.2 We believe that SMAC can provide a standard benchmark environment for years to come. Videos of our best agents for several SMAC scenarios are available at: https://youtu.be/VZ7zmQ_obZ0.
        ALE 和 MuJoCo 等标准化环境使单智能体 RL 能够超越玩具领域,例如网格世界。然而,对于合作的多智能体 RL,没有可比的基准。因此,该领域的大多数论文都使用一次性的玩具问题,因此很难衡量真正的进展。在本文中,我们提出了星际争霸多智能体挑战(SMAC)作为基准问题来填补这一空白。 1 SMAC基于流行的即时战略游戏《星际争霸II》,专注于微观管理挑战,其中每个单位都由一个独立的代理控制,该代理必须根据当地观察采取行动。我们为基准和评估的最佳实践提供了多样化的挑战场景和建议。我们还开源了一个深度多智能体RL学习框架,包括最先进的算法。 2 我们相信,SMAC可以在未来几年提供标准的基准测试环境。我们针对多个 SMAC 场景的最佳代理的视频可在以下网址获得:https://youtu.be/VZ7zmQ_obZ0。

1 Introduction 

1 引言

        Deep reinforcement learning (RL) promises a scalable approach to solving arbitrary sequential decision-making problems, demanding only that a user must specify a reward function that expresses the desired behaviour. However, many real-world problems that might be tackled by RL are inherently multi-agent in nature. For example, the coordination of self-driving cars, autonomous drones, and other multi-robot systems are becoming increasingly critical. Network traffic routing, distributed sensing, energy distribution, and other logistical problems are also inherently multi-agent. As such, it is essential to develop multi-agent RL (MARL) solutions that can handle decentralisation constraints and deal with the exponentially growing joint action space of many agents.
        
深度强化学习 (RL) 承诺提供一种可扩展的方法来解决任意顺序决策问题,仅要求用户必须指定表达所需行为的奖励函数。然而,RL可能解决的许多现实世界问题本质上是多智能体的。例如,自动驾驶汽车、自主无人机和其他多机器人系统的协调变得越来越重要。网络流量路由、分布式感知、能量分配和其他物流问题本质上也是多智能体的。因此,开发多智能体 RL (MARL) 解决方案至关重要,该解决方案可以处理去中心化约束并处理许多智能体呈指数级增长的联合行动空间。

        Partially observable, cooperative, multi-agent learning problems are of particular interest. Cooperative problems avoid difficulties in evaluation inherent with general-sum games (e.g., which opponents are evaluated against). Cooperative problems also map well to a large class of critical problems where a single user that manages a distributed system can specify the overall goal, e.g., minimising traffic or other inefficiencies. Most real-world problems depend on inputs from noisy or limited sensors, so partial observability must also be dealt with effectively. This often includes limitations on communication that result in a need for decentralised execution of learned policies. However, there commonly is access to additional information during training, which may be carried out in controlled conditions or simulation.
        
部分可观察的、合作的、多智能体的学习问题特别令人感兴趣。合作问题避免了一般和博弈固有的评估困难(例如,根据哪些对手进行评估)。合作问题也可以很好地映射到一大类关键问题,在这些问题中,管理分布式系统的单个用户可以指定总体目标,例如,最小化流量或其他低效率。大多数现实世界的问题都依赖于来自嘈杂或有限传感器的输入,因此还必须有效地处理部分可观测性。这通常包括对沟通的限制,导致需要分散执行所学政策。但是,在培训期间通常会访问其他信息,这些信息可以在受控条件或模拟中进行。

        A growing number of recent works Foerster et al. (2018a); Rashid et al. (2018); Sunehag et al. (2017); Lowe et al. (2017) have begun to address the problems in this space. However, there is a clear lack of standardised benchmarks for research and evaluation. Instead, researchers often propose one-off environments which can be overly simple or tuned to the proposed algorithms. In single-agent RL, standard environments such as the Arcade Learning Environment Bellemare et al. (2013), or MuJoCo for continuous control Plappert et al. (2018), have enabled great progress. In this paper, we aim to follow this successful model by offering challenging standard benchmarks for deep MARL and to facilitate more rigorous experimental methodology across the field.
        Foerster et al. ( 2018a) 近期作品越来越多;Rashid等人(2018);Sunehag等人(2017);Lowe等人(2017)已经开始解决这一领域的问题。然而,研究和评估显然缺乏标准化的基准。相反,研究人员经常提出一次性的环境,这些环境可能过于简单或针对所提出的算法进行调整。在单智能体RL中,标准环境,如Arcade学习环境Bellemare等人(2013)或用于连续控制的MuJoCo(Plappert等人,2018)已经取得了长足的进步。在本文中,我们旨在通过为深度 MARL 提供具有挑战性的标准基准来遵循这一成功的模型,并促进整个领域更严格的实验方法。

        Some testbeds have emerged for other multi-agent regimes, such as Poker Heinrich & Silver (2016), Pong Tampuu et al. (2015), Keepaway Soccer Stone et al. (2005), or simple gridworld-like environments Lowe et al. (2017); Leibo et al. (2017); Yang et al. (2018); Zheng et al. (2017). Nonetheless, we identify a clear gap in challenging and standardised testbeds for the important set of domains described above.
        其他多智能体机制已经出现了一些测试平台,例如Poker Heinrich & Silver(2016),Pong Tampuu等人(2015),Keepaway Soccer Stone等人(2005),或简单的网格世界环境Lowe等人(2017);Leibo et al. ( 2017);Yang et al. ( 2018);Zheng 等人(2017 年)。尽管如此,我们发现在上述重要领域集的挑战性和标准化测试平台方面存在明显差距。

        To fill this gap, we introduce the StarCraft Multi-Agent Challenge (SMAC). SMAC is built on the popular real-time strategy game StarCraft II3 and makes use of the SC2LE environment Vinyals et al. (2017). Instead of tackling the full game of StarCraft with centralised control, we focus on decentralised micromanagement challenges (Figure 1). In these challenges, each of our units is controlled by an independent, learning agent that has to act based only on local observations, while the opponent’s units are controlled by the hand-coded built-in StarCraft II AI. We offer a diverse set of scenarios that challenge algorithms to handle high-dimensional inputs and partial observability, and to learn coordinated behaviour even when restricted to fully decentralised execution.
        
为了填补这一空白,我们推出了星际争霸多智能体挑战赛(SMAC)。SMAC建立在流行的即时战略游戏《星际争霸II 3 》之上,并利用了SC2LE环境Vinyals et al. (2017)。我们没有通过集中控制来应对《星际争霸》的全部游戏,而是专注于分散的微观管理挑战(图 1)。在这些挑战中,我们的每个单位都由一个独立的学习代理控制,该代理必须仅根据本地观察采取行动,而对手的单位则由手动编码的内置星际争霸 II AI 控制。我们提供了一组多样化的场景,这些场景挑战算法来处理高维输入和部分可观察性,并学习协调行为,即使仅限于完全分散的执行。

        The full games of StarCraft: BroodWar and StarCraft II have already been used as RL environments, due to the many interesting challenges inherent to the games Synnaeve et al. (2016); Vinyals et al. (2017). DeepMind’s AlphaStar DeepMind (2019) has recently shown an impressive level of play on a StarCraft II matchup using a centralised controller. In contrast, SMAC is not intended as an environment to train agents for use in full StarCraft II gameplay. Instead, by introducing strict decentralisation and local partial observability, we use the StarCraft II game engine to build a new set of rich cooperative multi-agent problems that bring unique challenges, such as the nonstationarity of learning Foerster et al. (2017), multi-agent credit assignment Foerster et al. (2018a), and the difficulty of representing the value of joint actions Rashid et al. (2018).
        《星际争霸:母巢战争》和《星际争霸II》的完整游戏已经被用作RL环境,因为游戏固有的许多有趣的挑战 Synnaeve et al. ( 2016);Vinyals等人(2017)。DeepMind 的 AlphaStar DeepMind ( 2019) 最近在使用集中式控制器的《星际争霸 II》对决中展示了令人印象深刻的游戏水平。相比之下,SMAC并不打算作为训练特工用于完整《星际争霸II》游戏的环境。取而代之的是,通过引入严格的去中心化和局部部分可观测性,我们使用《星际争霸II》游戏引擎构建了一组新的丰富的合作多智能体问题,这些问题带来了独特的挑战,例如学习的非平稳性Foerster et al. (2017),多智能体信用分配Foerster et al.(2018a),以及表示联合行动价值的困难Rashid et al. (2018)。

        To further facilitate research in this field, we also open-source PyMARL, a learning framework that can serve as a starting point for other researchers and includes implementations of several key MARL algorithms. PyMARL is modular, extensible, built on PyTorch, and serves as a template for dealing with some of the unique challenges of deep MARL in practice. We include results on our full set of SMAC environments using QMIX Rashid et al. (2018) and several baseline algorithms, and challenge the community to make progress on difficult environments in which good performance has remained out of reach so far. We also offer a set of guidelines for best practices in evaluations using our benchmark, including the reporting of standardised performance metrics, sample efficiency, and computational requirements (see Appendix B).
        为了进一步促进该领域的研究,我们还开源了 PyMARL,这是一个学习框架,可以作为其他研究人员的起点,包括几个关键 MARL 算法的实现。PyMARL 是模块化的、可扩展的、基于 PyTorch 构建的,可作为处理实践中深度 MARL 的一些独特挑战的模板。我们使用 QMIX Rashid 等人 (2018) 和几种基线算法包括了全套 SMAC 环境的结果,并挑战社区在迄今为止仍无法获得良好性能的困难环境中取得进展。我们还为使用我们的基准进行评估的最佳实践提供了一套指南,包括报告标准化性能指标、样本效率和计算要求(见附录 B)。

        We hope SMAC will serve as a valuable standard benchmark, enabling systematic and robust progress in deep MARL for years to come.
        我们希望SMAC能够成为一个有价值的标准基准,在未来几年内在深度MARL方面取得系统和稳健的进展。

2 Related Work 

2 相关工作

Much work has gone into designing environments to test and develop MARL agents. However, not many of these focused on providing a qualitatively challenging environment that would provide together elements of partial observability, challenging dynamics, and high-dimensional observation spaces.
在设计环境以测试和开发 MARL 代理方面已经做了大量工作。然而,其中没有多少专注于提供一个定性上具有挑战性的环境,该环境将提供部分可观测性、具有挑战性的动力学和高维观测空间的元素。

Stone et al. (2005) presented Keepaway soccer, a domain built on the RoboCup soccer simulator (Kitano et al., 1997), a 2D simulation of a football environment with simplified physics, where the main task consists of keeping a ball within a pre-defined area where agents in teams can reach, steal, and pass the ball, providing a simplified setup for studying cooperative MARL. This domain was later extended to the Half Field Offense task (Kalyanakrishnan et al., 2006; Hausknecht et al., 2016), which increases the difficulty of the problem by requiring the agents to not only keep the ball within bounds but also to score a goal. Neither task scales well in difficulty with the number of agents, as most agents need to do little coordination. There is also a lack of interesting environment dynamics beyond the simple 2D physics nor good reward signals, thus reducing the impact of the environment as a testbed.
Stone et al. (2005) 提出了 Keepaway soccer,这是一个建立在 RoboCup 足球模拟器(Kitano et al., 1997)上的领域,这是一个具有简化物理的足球环境的 2D 模拟,其中主要任务包括将球保持在预定义的区域内,团队中的代理可以到达、抢断和传球,为研究合作 MARL 提供了简化的设置。这个领域后来扩展到半场进攻任务(Kalyanakrishnan等人,2006;Hausknecht et al., 2016),这增加了问题的难度,要求代理人不仅要将球保持在界内,还要进球。这两项任务都不能很好地扩展代理的数量,因为大多数代理几乎不需要进行协调。除了简单的 2D 物理特性之外,还缺乏有趣的环境动态,也没有良好的奖励信号,从而减少了环境作为测试平台的影响。

Multiple gridworld-like environments have also been explored. Lowe et al. (2017) released a set of simple grid-world like environments for multi-agent RL alongside an implementation of MADDPG, featuring a mix of competitive and cooperative tasks focused on shared communication and low level continuous control. Leibo et al. (2017) show several mixed-cooperative Markov environment focused on testing social dilemmas, however, they did not release an implementation to further explore the tasks. Yang et al. (2018); Zheng et al. (2017) present a framework for creating gridworlds focuses on many-agents tasks, where the number of agents ranges from the hundreds to the millions. This

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/wq6qeg88/article/details/137169928

智能推荐

艾美捷Epigentek DNA样品的超声能量处理方案-程序员宅基地

文章浏览阅读15次。空化气泡的大小和相应的空化能量可以通过调整完全标度的振幅水平来操纵和数字控制。通过强调超声技术中的更高通量处理和防止样品污染,Epigentek EpiSonic超声仪可以轻松集成到现有的实验室工作流程中,并且特别适合与表观遗传学和下一代应用的兼容性。Epigentek的EpiSonic已成为一种有效的剪切设备,用于在染色质免疫沉淀技术中制备染色质样品,以及用于下一代测序平台的DNA文库制备。该装置的经济性及其多重样品的能力使其成为每个实验室拥有的经济高效的工具,而不仅仅是核心设施。

11、合宙Air模块Luat开发:通过http协议获取天气信息_合宙获取天气-程序员宅基地

文章浏览阅读4.2k次,点赞3次,收藏14次。目录点击这里查看所有博文  本系列博客,理论上适用于合宙的Air202、Air268、Air720x、Air720S以及最近发布的Air720U(我还没拿到样机,应该也能支持)。  先不管支不支持,如果你用的是合宙的模块,那都不妨一试,也许会有意外收获。  我使用的是Air720SL模块,如果在其他模块上不能用,那就是底层core固件暂时还没有支持,这里的代码是没有问题的。例程仅供参考!..._合宙获取天气

EasyMesh和802.11s对比-程序员宅基地

文章浏览阅读7.7k次,点赞2次,收藏41次。1 关于meshMesh的意思是网状物,以前读书的时候,在自动化领域有传感器自组网,zigbee、蓝牙等无线方式实现各个网络节点消息通信,通过各种算法,保证整个网络中所有节点信息能经过多跳最终传递到目的地,用于数据采集。十多年过去了,在无线路由器领域又把这个mesh概念翻炒了一下,各大品牌都推出了mesh路由器,大多数是3个为一组,实现在面积较大的住宅里,增强wifi覆盖范围,智能在多热点之间切换,提升上网体验。因为节点基本上在3个以内,所以mesh的算法不必太复杂,组网形式比较简单。各厂家都自定义了组_802.11s

线程的几种状态_线程状态-程序员宅基地

文章浏览阅读5.2k次,点赞8次,收藏21次。线程的几种状态_线程状态

stack的常见用法详解_stack函数用法-程序员宅基地

文章浏览阅读4.2w次,点赞124次,收藏688次。stack翻译为栈,是STL中实现的一个后进先出的容器。要使用 stack,应先添加头文件include<stack>,并在头文件下面加上“ using namespacestd;"1. stack的定义其定义的写法和其他STL容器相同, typename可以任意基本数据类型或容器:stack<typename> name;2. stack容器内元素的访问..._stack函数用法

2018.11.16javascript课上随笔(DOM)-程序员宅基地

文章浏览阅读71次。<li> <a href = "“#”>-</a></li><li>子节点:文本节点(回车),元素节点,文本节点。不同节点树:  节点(各种类型节点)childNodes:返回子节点的所有子节点的集合,包含任何类型、元素节点(元素类型节点):child。node.getAttribute(at...

随便推点

layui.extend的一点知识 第三方模块base 路径_layui extend-程序员宅基地

文章浏览阅读3.4k次。//config的设置是全局的layui.config({ base: '/res/js/' //假设这是你存放拓展模块的根目录}).extend({ //设定模块别名 mymod: 'mymod' //如果 mymod.js 是在根目录,也可以不用设定别名 ,mod1: 'admin/mod1' //相对于上述 base 目录的子目录}); //你也可以忽略 base 设定的根目录,直接在 extend 指定路径(主要:该功能为 layui 2.2.0 新增)layui.exten_layui extend

5G云计算:5G网络的分层思想_5g分层结构-程序员宅基地

文章浏览阅读3.2k次,点赞6次,收藏13次。分层思想分层思想分层思想-1分层思想-2分层思想-2OSI七层参考模型物理层和数据链路层物理层数据链路层网络层传输层会话层表示层应用层OSI七层模型的分层结构TCP/IP协议族的组成数据封装过程数据解封装过程PDU设备与层的对应关系各层通信分层思想分层思想-1在现实生活种,我们在喝牛奶时,未必了解他的生产过程,我们所接触的或许只是从超时购买牛奶。分层思想-2平时我们在网络时也未必知道数据的传输过程我们的所考虑的就是可以传就可以,不用管他时怎么传输的分层思想-2将复杂的流程分解为几个功能_5g分层结构

基于二值化图像转GCode的单向扫描实现-程序员宅基地

文章浏览阅读191次。在激光雕刻中,单向扫描(Unidirectional Scanning)是一种雕刻技术,其中激光头只在一个方向上移动,而不是来回移动。这种移动方式主要应用于通过激光逐行扫描图像表面的过程。具体而言,单向扫描的过程通常包括以下步骤:横向移动(X轴): 激光头沿X轴方向移动到图像的一侧。纵向移动(Y轴): 激光头沿Y轴方向开始逐行移动,刻蚀图像表面。这一过程是单向的,即在每一行上激光头只在一个方向上移动。返回横向移动: 一旦一行完成,激光头返回到图像的一侧,准备进行下一行的刻蚀。

算法随笔:强连通分量-程序员宅基地

文章浏览阅读577次。强连通:在有向图G中,如果两个点u和v是互相可达的,即从u出发可以到达v,从v出发也可以到达u,则成u和v是强连通的。强连通分量:如果一个有向图G不是强连通图,那么可以把它分成躲个子图,其中每个子图的内部是强连通的,而且这些子图已经扩展到最大,不能与子图外的任一点强连通,成这样的一个“极大连通”子图是G的一个强连通分量(SCC)。强连通分量的一些性质:(1)一个点必须有出度和入度,才会与其他点强连通。(2)把一个SCC从图中挖掉,不影响其他点的强连通性。_强连通分量

Django(2)|templates模板+静态资源目录static_django templates-程序员宅基地

文章浏览阅读3.9k次,点赞5次,收藏18次。在做web开发,要给用户提供一个页面,页面包括静态页面+数据,两者结合起来就是完整的可视化的页面,django的模板系统支持这种功能,首先需要写一个静态页面,然后通过python的模板语法将数据渲染上去。1.创建一个templates目录2.配置。_django templates

linux下的GPU测试软件,Ubuntu等Linux系统显卡性能测试软件 Unigine 3D-程序员宅基地

文章浏览阅读1.7k次。Ubuntu等Linux系统显卡性能测试软件 Unigine 3DUbuntu Intel显卡驱动安装,请参考:ATI和NVIDIA显卡请在软件和更新中的附加驱动中安装。 这里推荐: 运行后,F9就可评分,已测试显卡有K2000 2GB 900+分,GT330m 1GB 340+ 分,GT620 1GB 340+ 分,四代i5核显340+ 分,还有写博客的小盒子100+ 分。relaybot@re...

推荐文章

热门文章

相关标签