返回题库

最优放弹珠 I

Optimal Marbles I

专题
Strategy / 策略
难度
L6

题目详情

两名玩家 A、B 各有 100 颗弹珠。两人同时决定往盒子里放入 1 到 100 颗弹珠(对方看不到)。随后进行两次“有放回抽取”:每次随机从盒中抽 1 颗弹珠。

  • 若抽到 A 的弹珠,A 从第三方获得 100a100-a 的收益(其中 aa 为 A 放入的弹珠数)。
  • 若抽到 B 的弹珠,B 从第三方获得 100b100-b 的收益。

两人都最优博弈。求 A 的两次抽取总期望收益。

Two players, say AA and BB, play the following game: Both players have 100100 marbles and may put anywhere between 11 and 100100 marbles in the box each. This decision is not revealed to the other player. Then, they draw 22 marbles with replacement between trials. If the marble belongs to AA, then assuming that AA put aa marbles in the box, AA is paid 100a100-a monetary units from a third party. Similarly if the marble belongs to BB, then assuming BB but bb marbles in the box, BB is paid 100b100-b monetary units from a third party. Assume both players play optimally. Find the expected total payout of player AA.

解析

单次抽取下 A 的期望收益为

aa+b(100a).\frac{a}{a+b}(100-a).

由于两次抽取独立且策略不会因次数改变,最优策略与单次相同,最后把单次期望乘以 2。

连续最优反应可得到对称均衡在 a=b100/3a=b\approx 100/3。在整数限制下取 a=b=33a=b=33 可满足最优性近似。

此时单次期望收益为

3366(10033)=1267=33.5,\frac{33}{66}\cdot (100-33)=\frac{1}{2}\cdot 67=33.5,

两次总期望约为 2×33.5=67.02\times 33.5=67.0


Original Explanation

We can first make some simplifications to the game. Firstly, if the strategy is optimal, then if this game were to be repeated many times, they would not change their strategy. Therefore, the optimal strategy for the game where there are 22 consecutive marble draws is the same as the optimal strategy for the game with one draw. Then, we just multiply the expected profit by 22 to represent the two draws. Furthermore, as this game is symmetric for the two players, their optimal strategy will be the same. This point will be important later.

Let A(a,b)A(a,b) be the expected profit that AA obtains with player AA putting in aa balls and BB putting in bb balls. Namely, for the one draw game,

A(a,b)=aa+b(100a)A(a,b) = \frac{a}{a+b} \cdot (100 - a)

As player aa draws his ball with probability aa+b\frac{a}{a+b} and 100a100-a is the payout. Let's fix bb and find the aa that is the best response to this bb. In other words, given bb, what aa optimizes A(a,b)A(a,b)? To do this, we take the partial derivative of A(a,b)A(a,b) in aa and treat aa as continuous for now. We will then account for discreteness at the end.

This yields that

aA(a,b)=a2+2ba100b(a+b)2=0    a2+2ba100b=0\frac{\partial}{\partial a}A(a,b) = -\frac{a^2 + 2ba - 100b}{(a+b)^2} = 0 \iff a^2 + 2ba - 100b = 0

Solving the above with the quadratic equation yields that a=2b±4b2+400b2a^* = \frac{-2b \pm \sqrt{4b^2 + 400b}}{2}. However, the - root results in a negative value, so a=b2+100bba^* = \sqrt{b^2 + 100b} - b is the best response for player AA if player BB puts bb marbles in. Similarly, as this game is symmetric, the optimal response for player BB if player AA puts aa marbles in is b=a2100aab^* = \sqrt{a^2 - 100a} - a.

To find the optimal strategy for each player, this means that we need to find the combo (a,b)(a^*,b^*) such that neither of the players can do better by adjusting their strategy. We already are aware from before that a=ba^* = b^* by the symmetry of the game. Therefore, to solve for this, we just substitute in bb as b=ab^* = a^* in the first equation. This yields we can say that

a=(a)2+100aa    4(a)2=(a)2+100a    a(3a100)=0    a=0,1003a^* = \sqrt{(a^*)^2 + 100a^*} - a^* \iff 4(a^*)^2 = (a^*)^2 + 100a^* \iff a^*(3a^* - 100) = 0 \iff a^* = 0,\frac{100}{3}

As 00 is not possible, we conclude that a=b=1003a^* = b^* = \frac{100}{3} is the optimal strategy. However, this is not actually possible, as our marbles must be an integer value. Therefore, we should test (33,33)(33,33) and (34,34)(34,34) to see if they are Nash equilibria.

For (33,33)(33,33), the expected payout for one draw for each player is 672\frac{67}{2}. One can check that by varying aa and keeping bb fixed at 3333, player AA can't do any better. Therefore, (33,33)(33,33) is a Nash equilibrium. For (34,34)(34,34), the expected payout is 3333. However, one can also verify that the expected payout for (33,34)(33,34) is also 3333. However, this can't be an equilibrium, as bb should change to 3333 marbles to yield higher expected payout. Thus, while (34,34)(34,34) is also a Nash equilibrium, (33,33)(33,33) is preferable because of the higher expected payout. This means that the optimal strategy is for both players to place 3333 marbles and have a total expected payout of 6722=67\frac{67}{2} \cdot 2 = 67.