多人演化雪堆博弈的合作动态研究

2016-10-13 15:59郑月龙张卫国

管理工程学报 2016年4期

郑月龙，张卫国,2

多人演化雪堆博弈的合作动态研究

郑月龙1，张卫国1,2

（1.重庆大学经济与工商管理学院，重庆400044；2.西南大学经济管理学院，重庆400715）

演化博弈在研究各种规模的合作问题方面处于中心地位。在全混合群体下，从现有雪堆模型抑制合作的缺陷出发，将时间成本作为决策参数引入现有模型，构造出一个考虑时间成本的多人演化雪堆博弈模型，并通过数值模拟对现有和新构造的模型进行了比较分析。研究结果表明：新模型可在一定程度上克服现有模型内生性不足问题，时间成本与收益成本比共同构成代理人策略选择的重要变量，两者的作用力同向且可相互替代，而群体规模对代理人的合作行为具有明显的抑制作用，权衡正反两个方面的力量可促使合作行为的产生。

雪堆模型；多人演化博弈；合作动态

0 引言

演化博弈理论在研究各种规模合作的出现及演化方面扮演着核心角色，并得到学者们越来越多的关注[1～3]。传统上，学者们将人们之间的交互行为从一次的、对称的二人合作困境视角进行了建模，如囚徒困境、雪堆博弈及猎鹿博弈[4]。然而，现实世界中更经常涉及多于两人的集体决策问题。这类合作行为最好放在多人博弈的框架内进行研究[5～7]，典型代表是公共物品博弈（PGG）。在当前有关雪堆模型和PGG研究的启发下，首先通过分析现有的雪堆博弈及多人雪堆博弈（NSG，N-person Snowdrift Game）[8～10]，指出现有模型的不足；在此基础上通过引入时间成本因素构造出一个考虑时间成本的多人雪堆博弈（DCNSG，Delay cost NSG），并对其进行求解；最后通过数值模拟比较性地研究了两种多人演化博弈模型。

1 雪堆博弈模型及其不足

在标准SG中，路上行驶的两名司机同时被一个雪堆挡住，只有将雪堆铲走他们才能继续前往目的地。此时，可能发生3种情况：两个人都不铲雪，因此没有人能够顺利前往目的地；这两个司机合作铲雪，他们都能到达目的地，两个共同承担由铲雪而产生的成本；如果仅仅一个人铲雪，两人均可到达目的地，但只有铲雪者承担了铲雪的全部成本。将上述支付及符号做如下定义：达到目的地获得的收益为，铲雪的全部成本为；如果合作铲雪，每人获得b-c/2；如果不合作，两人获得的收益都是0；如果只有一人合作，合作者（C）获得的收益为，不合作者（D）获得。通常假设收益大于成本，这样就可以得到一个类似斗鸡、鹰鸽或雪堆困境的排序特征的支付[11]，也只有当b﹥c﹥0时博弈的参与者选择合作行为才是有利可图的，其收益受到其它参与者所采取策略的影响，这样的参与者在雪堆博弈中被称为代理人。进一步地，可以想象若这个雪堆将个司机同时阻挡在十字路口，所有人均想到达目的地，以获得相同的支付，然而，并不是所有人愿意付出劳动而合作铲雪，如果所有人合作铲雪，那么每个人获得b-c/N，如果有k(k≥1)个代理人合作铲雪（C），则每人获得b-c/k的支付，而那些拒绝铲雪者（D）不用付出成本就可达到目的地并获得的收益，这样就将标准雪堆模型一般化为涉及多人（）的雪堆博弈（NSG），多人雪堆博弈的支付为：

(2)

由（1）和（2）可知，在多人雪堆博弈中，参与者不合作是其最优策略，因为不合作策略能够保证他得到的支付至少不低于其他参与者，这显然抑制了合作现象的产生，这是模型的内生性不足；另外，所有参与者都不合作时，现有模型表明代理人的支付均为零，暗含着这样的假设：时间的延误和到目的地所办的事情均无足轻重，这显然与现实和常理不符，因为人类大都是从事有目的、有意识的活动，若都不合作铲雪，代理人就会因时间延误而遭致赔偿、上级惩罚及焦虑等物质和精神上的损失，其支付应为负数而不是零；进一步地，即使部分代理人甚至全部代理人都铲雪，铲雪工作也不可能一蹴而就，不管是合作代理人还是拒绝铲雪者，在雪堆被彻底铲走之前，都必须承担与自己前往目的地所办事情及其心理状况相关的损失，例如不铲雪的代理人必须等待着直到雪堆被铲走，这明显与到达目的地所办事情有关，若到达目的地所办事情越重要，代理人承担的等待成本就会越大，那么代理人就越可能选择合作，因为铲雪的人越多，铲掉雪堆所花的时间就会越短，代理人就可能因时间节约而获益或减少损失，这也是现有模型所忽略的。针对以上不足，可对现有NSG模型进行扩展和改进。

2 考虑时间成本的雪堆博弈模型

更接近现实的模型需要设置更多额外的参数[8]，将时间参数引入现有NSG模型以实现对其的改进，可使模型更切合现实，为此算式（1）和（2）可扩展为：

(4)

3 模型求解

由于存在有限理性，代理人之间难以在最初就合作铲雪，而是一个不断学习的动态互动过程。全混合（well-mixed）群体下的多人演化雪堆博弈中的演化行为可通过复制动力学来表现[12]，合作者的概率为，为群体中合作代理人在时间的数量[13～14]，那么不合作者的概率为，的时间演化可由以下微分方程给出[12]：

(6)

(8)

(10)

将（3）和（4）式代入（6）和 (7)并结合（10）式得：

利用恒等式：

(12)

可得：

于是有

(14)

使用方程（14）于（11）可得：

以上便是考虑时间成本的多人雪堆博弈模（DCNSG）在稳定状态时关于的N阶解析方程。

4 结果与分析

图1 w=0时，稳定态x的数值模拟结果

进一步地，在图2（右）中，将群体规模控制为N=30时，总体来说，合作代理人的稳定均衡态x随着时间成本（w）和收益-成本比（b/c）的增大而逐渐增加，当时间成本较小（w=2）时稳定态x随着b/c的增加而增加，随着w的增加稳定均衡水平越来越高，时间成本增加到较高水平（w=1010）时，任意b/c水平都会使代理人倾向于选择合作行为；类似地，如图3（左）所示，当收益-成本比增加到较高水平（b/c=1015）时，任意时间成本下代理人也都倾向于选择合作行为，以上结果实质上是促使代理人合作的正反两方面的因素，它们发挥作用的方向相同，时间成本足够大或收益-成本比足够大，都将促使代理人由于考虑到不合作的损失太大而趋于选择合作，表明时间成本和收益-成本比具有相互替代性，由于时间成本是与到达目的地所办事情重要程度相关的产物，因此，以上结果表明时间成本的植入使合作行为的出现成为可能，进一步也说明了代理人到达目的地所办事情的收益（进而b/c）越大，也即到达目的地所办事情越重要，相应代理人的时间成本就越大的假说。

为了进一步说明时间成本w对稳定态x的影响，取收益-成本比为b/c=5的情况下，如图3（右）所示，当N取值为2，5，10时，稳定态x随着w的增加而较快速的增加；当N取值大于20时，稳定态x随着w的增加而增加的速度受到N增大的影响而明显减缓，且稳定均衡水平也随之下降，进一步说明了N抑制了w作用的发挥，从而抑制了合作行为的产生，这与图1和图2模拟结果是相同的，可能的原因是群体规模较小的时候协调起来比较容易，例如只有一个人的时候，别无选择只能选择铲雪，而随着群体规模的扩大协调变得愈加困难（如搭便车者或磨洋工者增多），因此，若要让代理人真正合作铲雪，在考虑时间成本及收益成本比的基础上，还需借助协调、激励等手段。

5 结论与启示

现有雪堆博弈模型的博弈结果将导致不合作，将时间成本w考虑进雪堆模型，构造出一个考虑时间成本的多人雪堆博弈模型（DCNSG），借助数值模拟，植入时间成本后的模型表明：新的博弈模型对于克服现有模型抑制合作的不足有一定的效果，时间成本的植入使得雪堆博弈合作行为出现成为可能，时间成本和收益-成本比起作用的方向相同且具有相互替代性，是代理人行为选择的两个重要决策变量，而群体规模对合作起到较大的抑制作用，权衡正反两种力量可促使代理人选择合作行为。上述结论对促进公共物品博弈（PPG）中（如公共工程建造、公共环境卫生维护等）的合作问题有一定的启示：通过收益成本比考察和评价代理人到达目的地所办事情的重要程度，进而衡量代理人的时间成本以判断代理人的合作意愿，并通过积极沟通、协调和建立信任等方式增加代理人合作意愿和合作效率。

[1] Macy M, Flache A. Learning dynamics in social dilemmas[J].Proc Natl Acad Sci U S A, 2002, 99:7229-7236.

[2] Nowak MA. Five rules for the evolution of cooperation[J].Science, 2006,314(5805):1560-1563.

[3] Sigmund K. The Calculus of Selfishness[M].Princeton: Princeton University Press, 2009.49-80.

[4] Santos MD, Pinheiro FL, Santos FC, et al. Dynamics of N-person snowdrift games in structured populations[J]. Journal of Theoretical Biology, 2012, 315:81-86.

[5] Gokhale CS, Traulsen A. Evolutionary games in the multiverse[J]. Proc Natl Acad Sci U S A, 2010, 107(12):5500-5504.

[6] Santos FC, Pacheco JM. Risk of collective failure provides an escape from the tragedy of the commons[J].Proc Natl Acad Sci U S A, 2011, 108(26):10421-10425.

[7] Van Segbroeck S, Pacheco JM, Lenaerts T, et al. Emergence of fairness in repeated group interactions[J].Physical Review Letters, 2012, 108(15):1-5.

[8] Zheng DF, Yin HP, Chan CH, et al. Cooperative behavior in a model of evolutionary snowdrift games with N-person interactions[J]. Europhysics Letters, ‎2008, 80:1-4.

[9] Galbiati R, Vertova P. Obligations and cooperative behaviour in public good games[J].Games and Economic Behavior,2008,64(1):146-170.

[10] Souza MO, Pacheco JM, Santos FC. Evolution of cooperation under N-person snowdrift games[J].Journal of Theoretical Biology, 2009, 260(4):581-588.

[11] Maynard Smith J. Evolution and the theory of games[M].Cambridge: Cambridge University Press, 1982.10-27.

[12] Hofbauer J, Sigmund K. Evolutionary games and population dynamics[M].Cambridge: Cambridge University Press, 1998.57-79.

[13] Hauert C, Doebeli M. Spatial structure often inhibits the evolution of cooperation in the snowdrift game[J].Nature, 2004, 428(6983) : 643 - 646.

[14] Zhong LX, Zheng DF, Zheng B, et al. Networking effects on cooperation in evolutionary snowdrift game[J].Europhysics Letters, 2006, 76(4):724-730.

[15] Hauert C, Michor F, Nowak MA, et al. Synergy and discounting of cooperation in social dilemmas[J].Journal of Theoretical Biology, 2006, 239(2):195-202.

Cooperation Dynamic underN-person Snowdrift Games

ZHENG Yue-long1, ZHANG Wei-guo1,2

(1.School of Economics and Business Administration, Chongqing University, Chongqing 400044, China;2.College of Economics and Management, Southwest University, Chongqing 400715, China)

Evolutionary game theory plays a central role in the study of the emergence and evolution of cooperation at all scales. Traditionally, interactions have been modeled by scholars in terms of one-shot, symmetric two-person dilemmas of cooperation, such as the Prisoner’s Dilemma, the Snowdrift Game and the Stag-Hunt Game. However, such situations are often met by us in the real word. For instance, accomplishing a task often needs several group members to corporate. They bear all the related costs, while others who don’t corporate only share the benefits after achieving the task. Therefore, the collective decision, derived from the groups which have more than two people, is involved. This kind of cooperative issues is best studied in the framework of N-person games, such as the typical public goods game (PGG).

Motivated by the recent works on PGG and N-person Snowdrift Games (NSG), we found the following drawbacksfrom the analysis of the existing snowdrift game model. Firstly, the non-cooperation strategy may ensure that the payment is not less than the other participants’. Non-cooperation is the optimal strategy in NSG, which constitutes endogenous drawbacks of the model. Then, the existing model shows that agents obtain zero payment if all participants don’t cooperate, which implies that it is of little significance for agents to delay time and the things done at the destination. Obviously, it is not in conformity with reality and common sense for the reason that human beings mainly engage in activities with purpose and consciousness. If they all refuse to shovel snow, agents will suffer spiritual and material losses such as anxiety and penalty resulting from delays. Thus, their payment should be negative rather than zero. Finally, even if some or all agents shovel snow, snowdrift is also unlikely to be shoveled out in one step. Before the snow is thoroughly shoveled out, the losses resulting from the importance of doing things at the destination and their psychological condition must be undertaken by both the cooperator and the refusal one. This surely affects the agent’s strategic choices, but ignored by the existing model.

To further study agent’s cooperation dynamic under well-mixed population and based on the defects of the going snowdrift game, we build a Delay Cost N-person Snowdrift Games (DCNSG) by incorporating delay cost into the existing model, and analyze these two models from a comparative perspective based on numerical methods using MATLAB. The study reveals that DCNSG is somewhat effective for overcoming the shortcomings of the existing model.The implanted delay cost makes the system cooperation possible and plays the role in the same direction with the benefit-cost ratio. These roles can be interchangeable and constitute the important decision variables for agents.However, the group size restrains the emergence of cooperation obviously. Cooperation behavior can appear via weighing positive and negative aspects.

Our findings have important implications for solving PPG cooperation problems, such as public engineering construction, public environmental maintenance, etc. Specifically, agents should be examined and evaluated in terms of benefit-cost ratio and the importance of doing things to their destination. In addition, the time cost of agent should be measured in order to reveal their cooperation intention. Simultaneously, their cooperation willingness and efficiency should be improved by positively communicating, cooperating and establishing trust.

snowdrift game; N-person evolutionary game; cooperation dynamic

中文编辑：杜健；英文编辑：Charlie C. Chen

O152.1

1004-6062(2016)04-0112-05

10.13587/j.cnki.jieem.2016.04.014

2013-10-08

2014-06-11

教育部高等学校博士学科点科研基金资助项目（20130191120058）；国家社科基金重大资助项目（12&ZD100）

郑月龙(1981—)，男，内蒙古太仆寺旗人；重庆大学经济与工商管理学院博士研究生，研究方向：博弈论，企业战略与创新管理。