机器人标枪

Robot Javelin

专题: Probability / 概率
难度: L7
来源: Jane Street

题目详情

马上就要到年底了，这只能意味着一件事：今年的机器人标枪决赛到了！哇等等，你从来没有听说过机器人标枪？那么好吧！请允许我解释一下规则：

这是正面交锋。两个机器人各自进行第一次投掷，其距离是从 [0, 1] 统一抽取的实数。
然后，在不知道竞争对手结果的情况下，每个机器人决定是保持当前距离还是消除它并进行第二次投掷，他们必须保持第二次投掷距离（也是从 [0, 1] 统一得出）。
最终距离较大的机器人获胜。

今年的决赛将由您的机器人 Java-lin 对阵挑战者 Spears Robot。现在，机器人已经光荣地竞争了很多年，并且已经适应了这场比赛的纳什均衡。然而，您刚刚了解到 Spears Robot 已发现并利用了游戏协议中的漏洞。他们可以收到一点信息，告诉他们对手的第一次投掷（距离）是否高于或低于他们选择的某个阈值d，然后再决定是否进行第二次投掷。斯皮尔斯大概选择了d来最大化他们获胜的机会 - 难怪他们能进入决赛！

Spears Robot 并不知道你已经了解了这个事实；他们假设 Java-lin 使用纳什均衡。如果您**调整 Java-lin 的策略以最大化其获胜的几率，那么 Java-lin 更新后的获胜概率是多少？请以精确的形式给出答案，或者以四舍五入到 10 位的小数形式给出答案。

It’s coming to the end of the year, which can only mean one thing: time for this year’s Robot Javelin finals! Whoa wait, you’ve never heard of Robot Javelin? Well then! Allow me to explain the rules:

It’s head-to-head. Each of two robots makes their first throw, whose distance is a real number drawn uniformly from [0, 1].
Then, without knowledge of their competitor’s result, each robot decides whether to keep their current distance or erase it and go for a second throw, whose distance they must keep (it is also drawn uniformly from [0, 1]).
The robot with the larger final distance wins.

This year’s finals pits your robot, Java-lin, against the challenger, Spears Robot. Now, robots have been competing honorably for years and have settled into the Nash equilibrium for this game. However, you have just learned that Spears Robot has found and exploited a leak in the protocol of the game. They can receive a single bit of information telling them whether their opponent’s first throw (distance) was above or below some threshold d of their choosing before deciding whether to go for a second throw. Spears has presumably chosen d to maximize their chance of winning — no wonder they made it to the finals!

Spears Robot isn’t aware that you’ve learned this fact; they are assuming Java-lin is using the Nash equilibrium. If you were to adjust Java-lin’s strategy to maximize its odds of winning given this, what would be Java-lin’s updated probability of winning? Please give the answer in exact terms, or as a decimal rounded to 10 places.

解析

当人们发现 Spears Robot 以秘密优势进入决赛时，机器人标枪锦标赛陷入了丑闻。我们的任务是为 Spears 的对手 Java-lin 确定更新的策略，并最终确定 Java-lin 更新的获胜机会。我们是如何做到的？

首先我们必须找到公平博弈的纳什均衡，即没有信息泄漏的博弈。利用对称性和无差异原理，我们可以证明，如果每个机器人的第一次投掷小于黄金比例 ( $\sqrt{5}$ - 1)/2 = $\varphi$ ~ 0.618034…，则每个机器人都会再次投掷。不错！

接下来，我们利用 Spears Robot 的信息优势来确定其开发策略。事实证明，对于斯皮尔斯来说，了解对手第一次投掷的最好方法就是看这是否导致他们再次投掷。（也就是说，无论他们的第一次投掷高于还是低于 $\varphi$ 。）当投掷低于 $\varphi$ 时，Spears 知道对手第二次投掷的均匀分布，因此如果 Spears 的第一次投掷低于 0.5，则应重新投掷。当对手的第一次投掷高于 $\varphi$ 时，另一个无差异计算表明 Spears 应该在低于 (1 - $\varphi$ /2) ~ 0.690983 的任何分数上重新投掷......

最后，我们必须扭转 Spears Robot 的局面，并利用它与纳什的偏差！当低于时Java-lin选择重新抛出的阈值是 $\frac{7}{12} < \varphi$ 。漏洞利用发生在区间 [7/12, $\varphi$ ] 内。这是 Java-lin通常选择重新投掷的区间，但由于 Spears Robot 会假设这一点，Java-lin 可以在此区间内保持其最佳投掷，并利用 Spears Robot 将弱分数保持在 0.5 和 $\varphi$ 之间。

现在我们知道了反策略，我们可以计算其获胜机会，其获胜机会为**(229 - 60 $\sqrt{5}$ )/192 ~ 0.4939370904…**。所以 Java-lin 几乎可以通过这种对抗策略将机会扳平到 50/50！

恭喜所有计算出获胜概率的人！

Original Explanation

The Robot Javelin championships were ebroiled in scandal when it was discovered that Spears Robot was heading into the finals with a secret advantage. We were tasked with determining an updated strategy for Spears’ opponent, Java-lin, and ultimately Java-lin’s updated winning chances. How did we do it?

First we had to find the Nash equilibrium for the fair game, that is the game with no information leakage. Using symmetry and the indifference principle we can show that each robot throws again if their first throw is less than (√5 - 1)/2 = φ ~ 0.618034…, the golden ratio. Nice!

Next, we determine Spears Robot’s exploitation strategy using its informational advantage. It turns out the best thing for Spears to learn about its opponent’s first throw is whether it caused them to rethrow. (That is, whether their first throw is above or below φ.) When the throw is below, Spears knows it is up against a uniform distribution of the opponent’s second throw and so Spears should rethrow if its first throw is below 0.5. When the opponent’s first throw is above φ, another indifference calculation shows that Spears should re-throw on any score below (1 - φ/2) ~ 0.690983….

Finally, we have to turn the tables on Spears Robot and exploit its deviation from Nash! The threshold that Java-lin chooses to rethrow when below is 7/12 < φ. The exploitation happens in the interval [7/12, φ]. This is the interval where Java-lin would normally choose to rethrow, but because Spears Robot will assume this, Java-lin can keep its best throws in this interval and take advantage of Spears Robot keeping weak scores between 0.5 and φ.

Now that we know the counterstrategy we can compute its winning chances, which come to (229 - 60√5)/192 ~ 0.4939370904…. So Java-lin is almost able to even the chances to 50/50 with this counterstrategy!

Congrats to everyone who computed the winning probability!