Perspective:Prospects for multi-agent collaboration and gaming:challenge，technology，and application*

2022-11-17 17:13:31YuLIUZhiLIZhizhuoJIANGYouHE

Frontiers of Information Technology & Electronic Engineering 2022年7期

Yu LIU，Zhi LI‡，Zhizhuo JIANG，You HE

1Department of Electronic Engineering，Tsinghua University，Beijing 100084，China 2Shenzhen International Graduate School，Tsinghua University，Shenzhen 518055，China

Recent years have witnessed significant improvement of multi-agent systems for solving various decision-making problems in complex environments and achievement of similar or even better performance than humans.In this study，we briefly review multi-agent collaboration and gaming technology from three perspectives，i.e.，task challenges，technology directions，and application areas.We first highlight the typical research problems and challenges in the recent work on multi-agent systems.Then we discuss some of the promising research directions on multi-agent collaboration and gaming tasks.Finally，we provide some focused prospects on the application areas in this field.

1 Introduction

Multi-agent systems，which aim to understand complex environments and adaptively make decisions to compete and coordinate (Vinyals et al.，2019) with humans or other agents，have become increasingly hot topics in both academia and industry.Actually，thanks to tremendous success in machine learning and reinforcement learning，recent years have witnessed sensational advances in many prominent agent decision-making problems，such as robotic control (Polydoros and Nalpantidis，2017)，Go game playing (Silver et al.，2016，2017)，and autonomous driving(Grigorescu et al.，2020).

For a long time，researchers focused mainly on a single agent to learn environments and take autonomous actions to change the environment(Jennings et al.，1998).These early researchers always treated this problem as a Markov decision process (Puterman，1994) and applied transition planning methods，such as Monte-Carlo tree search (Coulom，2007)，to optimize the Markov decision process.A principled mathematical framework for adaptive agent learning is reinforcement learning (Arulkumaran et al.，2017).Actually，reinforcement learning has achieved great successes in many tasks (Arulkumaran et al.，2017).However，traditional reinforcement learning methods still have great computational complexity challenges(Arulkumaran et al.，2017).These complex challenges can be overcome by the powerful representation ability of deep neural networks.For example，in the Go game，it is difficult for traditional artificial intelligence(AI)to evaluate all board positions and moves because of the enormous search space (Silver et al.，2016).However，with the representation of deep learning，AI system can achieve superior performance (Silver et al.，2016，2017).Consequently，deep reinforcement learning has attracted much attention from many agent application areas，such as business management(Betancourt and Chen，2021)and industry control(Spielberg et al.，2017).

Recently，with the rapid development of AI technologies，more and more researchers have started to pay much attention to multi-agent systems.Actually，multi-agent systems are of great challenges to involve the participation of more than one single agent (Zhang KQ et al.，2021).To begin with，multi-agent environments are much more sophisticated than single-agent environments，because the actions of other agents may make the environment unstable.This situation always makes it difficult for researchers to model the real-world multi-agent environment and construct a reliable simulation environment.Then，considering multi-agent system characteristics，agent communications and cooperations are two major challenges for multi-agent collaborations.On one hand，communication is a big problem for different agents who share the environment perception and action commands.On the other hand，how to make all the agents collaborate to achieve the final targets is a non-trivial task.Last but not least，multi-agent systems are always applied to competitive environments in many applications.Along this line，the return of agents in competitive environments is usually a zero-sum operation (Zhang KQ et al.，2021).This means that multi-agent systems need to take competitive environments into consideration，which also brings some major challenges.

In this study，we provide a focused look at the multi-agent collaboration and gaming task，from research challenges to technical directions，and to application areas.Although several surveys have reviewed many aspects of multi-agent systems (Busoniu et al.，2008;Hernandez-Leal et al.，2017;Oroojlooy and Hajinezhad，2019;Nguyen et al.，2020;Zhang KQ et al.，2021)，recent surveys focus on either a sub-domain (Hernandez-Leal et al.，2017;Oroojlooy and Hajinezhad，2019) such as cooperative multi-agent systems(Oroojlooy and Hajinezhad，2019)or specific methods such as deep reinforcement learning (Nguyen et al.，2020).Different from prior work，in this study，we spotlight mainly several new research directions that are comparatively underexplored in existing reviews and hope to suggest some insightful ideas for future multi-agent collaboration and gaming research areas.More specifically，we first introduce some open issues and challenges in this area.Then，we provide a related outlook on technical directions that may bring some insightful thinking to these research challenges.Finally，we bring some prospects of the application areas for multi-agent collaboration and gaming.We hope this paper can provide a quick overview of multi-agent study with a special focus on agent collaboration and gaming.

2 Task challenges

The research on multi-agent systems has a long history(Hoen et al.，2005;Busoniu et al.，2008)，but there are still many open issues.In this section，we provide a selective overview of three major problems in multi-agent collaboration and gaming.

2.1 Multi-agent environment construction

Generally，multi-agent environment problems have been ignored by prior reviews.Many multiagent systems studies rely mainly on totally virtual environments or simulated environments.Previous efforts usually construct virtual environments from games and develop multi-agent evaluation platforms.Arcade learning environment (Atari) (Bellemare et al.，2013)was first developed to evaluate general agents in Atari 2600 game environments.ViZDoom (Kempka et al.，2016) first presents a semirealistic three-dimensional(3D)evaluation platform in a shooter video game environment.There are also many other multi-agent evaluation platforms，such as MuJoCo(Todorov et al.，2012)，Minecraft(Johnson et al.，2016;Tessler et al.，2017)，DeepMind Lab (Beattie et al.，2016)，OpenAI Gym (Brockman et al.，2016)，FAIR TorchCraft(Synnaeve et al.，2016)，and Botzone (Zhou et al.，2018).In recent years，researchers have shown special interest in real-time strategy games，such as StarCraft(Shao et al.，2019;Vinyals et al.，2019) and Dota (Berner et al.，2019).Although these platforms have greatly advanced the development of multi-agent technologies and evaluations，there is still a huge gap between these virtual environments and real-world applications.

Consequently，it is an urgent challenge to construct a more open and real environment and then bridge the gap between multi-agent computational environments and real-world scenes.Actually，both academia and industry have paid much attention to this issue and made several attempts.For example，the RoboCup (https://www.robocup.org/) founded in 1996 was a successful attempt at constructing a real collaborative and competitive environment for multi-agent systems.The DJI hosted an annual intercollegiate robot competition，namely RoboMaster(https://www.robomaster.com/)，and introduced a confrontational 5-on-5 MOBA-style robot combat，which is another great attempt at a real environment for multi-agent systems.These attempts have made important progress on constructing real-world multiagent collaborative and competitive environments.However，these attempts in real-world multi-agent environments are still limited.On one hand，the existing environments are focused mainly on a specific task.On the other hand，these environments are still in a closed setting with limited time and space.In the real world，tasks for agents are more complex and various.The multi-agent environments are always with open settings，which means that the time and space are without limitations.In other words，it is still an open issue with great challenges to construct a more realistic multi-agent environment.

2.2 Multi-agent collaboration

Different from single-agent systems，multi-agent systems naturally consist of multiple agents with various sensors.Therefore，the collaboration of different agents is a major task that cannot be ignored.Indeed，in a complex environment，a single agent could not obtain comprehensive environment information.All the agents need to collaborate with each other to achieve a global perception of the environment and make cooperative decisions.In this subsection，we will discuss two challenges in multi-agent collaboration:multi-agent communication and collaborative perception.

2.2.1 Multi-agent communication

Multi-agent communication is a classical and important research topic in multi-agent systems (Georgeff，1988;Wang RD et al.，2020).With the rapid development of multi-agent reinforcement learning，the communication problem can be divided into two major questions:who needs to be communicated with and what needs to be communicated (Wang RD et al.，2020).Actually，these two major questions are always addressed as scheduling learning tasks and communication protocols (Wang RD et al.，2020).For scheduling learning tasks，recent researchers have shown special interest in multiagent reinforcement learning with scheduling methods.These methods can be grouped into two classes by the scheduler selections.Several studies proposed gating mechanisms to control agent communications (Jiang and Lu，2018;Singh et al.，2018;Kim et al.，2019;Mao et al.，2019)，whereas other researchers focused on determining the importance of different agent messages and used bi-directional recurrent neural networks (RNNs) (Peng et al.，2017)or the attention mechanism(Das et al.，2019)to learn an adaptive weighted scheduler.For communication protocols，most studies have proposed an endto-end framework to coordinate message exchanges from different agents (Foerster et al.，2016;Lazaridou et al.，2017;Mordatch and Abbeel，2018).

In spite of the great progress of prior multiagent communication studies，there are still many challenges that need to be deeply explored.For example，content redundancy is an urgent problem for scheduling methods.In real multi-agent systems，the bandwidth is always limited (Wang RD et al.，2020)，which makes it difficult to transmit a largescale message between two agents.How to construct a more effective multi-agent communication mechanism is still an open issue in real-world applications，such as the Internet of Vehicles (Li ZY et al.，2021).

2.2.2 Collaborative perception

In addition to the communication challenges，the collaborative perception of multi-agent systems is another essential problem.Unfortunately，the exploration of this task is limited (Liu et al.，2020a，2020b).Different from multi-agent communication，perception focuses mainly on the communication message processing rather than the communication scheduling or protocol.Although single-agent perception has achieved great success in recent years，it is a non-trivial problem to obtain multi-agent perception over single-agent perception.First，a single-agent can hardly perceive comprehensive environments.Then，how to integrate different agents and achieve a precise fused perception is an underexplored issue (Li YM et al.，2021).Second，the transmission bandwidth is a great challenge in both multi-agent communication (Wang RD et al.，2020)and collaborative perception (Liu et al.，2020a;Li YM et al.，2021).The reason is that communication is an important pre-task for perceptions.Third，considering the various task settings，the sensors in different agents may be different，such as the camera and laser radar(LIDAR) in robotics.Therefore，the fusion of these different sensor data from agents is another important but unexplored problem.

2.3 Multi-agent gaming

Recent years have witnessed great improvement in AI of real-time strategy games (Vinyals et al.，2019).Multi-agent gaming，which is one of the major tasks in these areas，has attracted significant research attention.Different from multi-agent collaboration，multi-agent gaming requires an understanding of adversary behaviors in addition to competitive environments and the ability to adaptively make decisions to achieve the task targets or obtain higher scores than competitors.Along this line，multi-agent gaming has two main challenges:competitive environment modeling and competitive decision making.

2.3.1 Competitive environment modeling

For multi-agent gaming，one of the major challenges is understanding competitive environments.Multi-agent competitive environments can be grouped mainly into two classes，i.e.，perfect information games and imperfect information games.In perfect information games，the agents can observe all the environments and states of other players in time.On the contrary，imperfect information games present another common competitive environment，in which agents cannot know all the moves already made by the opponent.For example，the Go game (Silver et al.，2016，2017) is a two-player perfect information game and StarCraft(Shao et al.，2019;Vinyals et al.，2019) is a multi-player imperfect information game.As mentioned above，the biggest difference between competitive and cooperative multi-agent environments is the introduction of competitors.That also brings great challenges.On one hand，the introduction of competitors brings more environmental uncertainty.On the other hand，in multi-agent gaming，environment modeling needs to consider the situation of competitors.Especially for the imperfect information game，the limited exploration of the environment makes it difficult to predict the behaviors and intent of competitors.

Indeed，most multi-agent gaming methods，especially deep reinforcement learning methods，usually model the competitive environment as a zerosum game (Barron，2013;Leonardos et al.，2021).However，in real-world applications，such as power control (Mei et al.，2017)，the competitive settings are always general-sum games.Unfortunately，general-sum multi-agent gaming is still largely under-explored (Lin et al.，2019;Mazumdar et al.，2020;Neumeyer et al.，2021).

2.3.2 Competitive decision making

In multi-agent competitive settings，the agent behaviors are much more complex than those in cooperative settings.An agent always needs to predict the behavior of competitors and incorporate an understanding of current environments before creating its own strategy，which makes it difficult to find an equilibrium selection strategy.Prior research has focused mainly on designing convergent algorithms to model the complex behavior of competitive game theoretic settings(Rakhlin and Sridharan，2013;Balduzzi et al.，2018;Bailey and Piliouras，2019).However，the competitive decision-making task is still under-explored and presents significant challenges.First，considering complex competitive environments，the different agents have diverse task settings，such as offense，defense，and protection in confrontation games.These task settings should adapt to dynamic competitive environments.Understanding tasks and adaptively selecting agent roles in the environment is one of the key challenges for agent decision making.Second，different from collaborative environments，the introduction of competitors makes understanding and forecasting the purpose of competitors another problem.Third，to achieve the final win，the balance of short-and long-term returns is an important issue that must be considered.In conclusion，these challenges make multi-agent autonomous decisions a non-trivial problem which is still under-explored.

3 Technology directions

Based on the research challenges in multiagent collaboration and gaming，in this section，we discuss some technical areas that may promote developments of future research in multi-agent systems.

3.1 Digital environment construction

As mentioned earlier，most previous works were based on virtual environments，such as games，to develop multi-agent collaboration and gaming platforms.This situation creates a huge gap between virtual environments and real-world applications.To bridge this gap，a more realistic digital environment using real-world scenarios needs to be constructed.Fortunately，there has been tremendous success in computer vision and computer graphics.Computational simulation technologies，such as Digital Twin (Tao et al.，2019)and Metaverse(Dionisio et al.，2013)，have been developed to construct 3D environments that can reflect real objects and physical user interactions.These technologies can also add some new perspectives in multi-agent environment construction to further promote application of multi-agent collaboration and gaming in real-world scenarios.

3.2 Multi-modal perception and collaboration

In real multi-agent systems，perception of the environment is based mainly on the sensors in the agents.For a comprehensive understanding of the environment，the single agent is always equipped with multiple different sensors to collect multi-modal data，such as the optical signal，electromagnetic signal，and radar signal.Each type of sensor has the specific ability to perceive the environment and the states of other agents.For example，the radar sensor can continuously work in complex monitoring environments，such as foggy or strong light environments.The vision sensors can reflect intuitively and help humans quickly understand the agent’s perceptions，which is important for human–machine collaborations.The multi-modal learning and fusion (Baltrušaitis et al.，2019)technologies are important for building an accurate perception of multi-agent systems.Moreover，considering the complexity of agent tasks，the different agents in one multi-agent system could have various perception targets.This means that the collaboration of multiple perceptions is another research direction for multi-agent systems.

3.3 Multi-task learning and gaming

In a confrontational and gaming environment，it is important to understand gaming tasks and competitive environments for multi-agent systems.Agent tasks are always changed with dynamic environments，especially with the unknown competitor states in imperfect information gaming.Along this line，transfer learning (Zhuang et al.，2021)and multi-task learning (Zhang Y and Yang，2018) are considerable methods for improving the generalization performance and adaptively transferring the knowledge from one task to another task in multiagent gaming.Real responses and data samples are difficult to obtain in many real scenarios，such as military applications.Transfer learning could be an efficient way to transfer a trained model in a simulated environment and apply it in a realistic environment.Considering the large-scale state-action space，hierarchical reinforcement learning(Nachum et al.，2018)is another potential research direction to disentangle the complex multi-agent gaming problem.Inverse reinforcement learning (Arora and Doshi，2021) is also a potential way to learn from expert trajectories and deal with the reward dilemma designed in the complex multi-agent gaming task.

4 Application areas

Since the success of multi-agent systems and reinforcement learning，many researchers have developed multi-agent collaboration and gaming methods in various application areas.In this section，we look at multi-agent collaboration and gaming application areas.

Indeed，a prominent application area is the control of swarm robotic systems (Hüttenrauch et al.，2019)or unmanned aerial vehicles (Tso et al.，1999;Wang YN et al.，2022).These control problems are always based on multi-agent collaboration and gaming methods and require an understanding of the dynamic environment and automatic decisions to achieve complex tasks in competitive or noncompetitive settings.Another application area of wide attention is the game AI (Silver et al.，2017;Vinyals et al.，2019).Games have attracted tremendous research attention in recent years and great success has been achieved in related AI methods.Games are good platforms for evaluating multi-agent collaboration and gaming methods because of the well-defined environments and rewards.However，as mentioned above，how to bridge the gap between virtual game environments and realistic applications is still an open issue.There are also many industrial applications，such as energy management (Lagorse et al.，2010)，urban traffic control (Wang Y et al.，2020)，and sport AI (Cañizares et al.，2017).

5 Conclusions

In this study，we presented the prospects for multi-agent system research with a special focus on agent collaboration and gaming tasks.We briefly introduced some open issues and task challenges from three major perspectives:the multi-agent environment，collaboration，and gaming.Then we provided a related outlook for the technology directions that may create some research challenge insights.Finally，we discussed the outlook for the multi-agent collaboration and gaming application areas.

Contributors

Yu LIU designed the research.Zhi LI drafted the paper.Yu LIU helped organize the paper.Zhizhuo JIANG and You HE revised and finalized the paper.

Compliance with ethics guidelines

Yu LIU，Zhi LI，Zhizhuo JIANG，and You HE declare that they have no conflict of interest.

Frontiers of Information Technology & Electronic Engineering2022年7期

Frontiers of Information Technology & Electronic Engineering的其它文章: Institutionalized and systematized gaming for multi-agent systems; Efficient decoding self-attention for end-to-end speech synthesis*; Cellular automata based multi-bit stuck-at fault diagnosis for resistive memory; Enhanced solution to the surface–volume–surface EFIE for arbitrary metal–dielectric composite objects*; Review:Light field imaging for computer vision:a survey＊#; Cooperative planning of multi-agent systems based on task-oriented knowledge fusion with graph neural networks*