放或拿

Place or Take

专题: Strategy / 策略
难度: L6
来源: QuantQuestion

题目详情

你有两个不透明盒子。每一步必须选择：

place：第三方往随机一个盒子里放入 1 元；
take：随机清空一个盒子，并把该盒子里的钱全部拿走。

共 100 步。你在游戏过程中看不到每次 take 具体拿了多少，只在最后结算。

问：最优策略下期望收益是多少？

You are playing a one-player game with two opaque boxes. At each turn, you can choose to either "place" or "take". "Place" places $1 from a third party into one box randomly. "Take" empties out one box randomly and that money is yours. This game consists of 100 turns where you must either place or take. Assuming optimal play, what is the expected payoff of this game? Note that you do not know how much money you have taken until the end of the game.

解析

最优策略是把所有 place 放在前面、take 放在后面（place 放在 take 之后只会降低期望可回收金额）。

若先做 $p$ 次 place，则盒子里总金额为 $p$ ；再做 $m=100-p$ 次 take。

每次 take 的期望能拿走当前总金额的一半，因此 $m$ 次 take 的期望回收比例为

\sum_{i=1}^{m}\frac{1}{2^i}=1-2^{-m}.

所以期望收益为

G(p)=p\left(1-2^{-(100-p)}\right).

比较 $p$ 可得最优为 $p=94$ （即 94 次 place 后接 6 次 take），此时期望为

94\left(1-2^{-6}\right)=94\cdot\frac{63}{64}=\frac{2961}{32}\approx 92.5313.

Original Explanation

There are two key elements to this game that should be considered: the sequence of places and takes, and the total number of places and takes. We'll begin by addressing the first element. Intuitively, no place actions should follow a take. Greedily, it makes sense to place a take after all places to maximize the expected value of that take (since each place action after a take reduces the potential payout by $1); therefore, all takes should be clustered at the end of the 100 turns. This can also be proven using induction, but we'll leave that as an exercise for the reader. Now, let's consider the second element: the number of places and takes. If there are$ p $place actions, there will be$ 100 - p $takes, and the$ p $place actions must happen before the$ 100 - p $takes. To determine the optimal value of$ p $, we can assess the point at which adding one more place action no longer increases the expected payout. Look at the$ p $-th action where$ 1 < p < 100 $and assume this is the last place before we start taking, or the next action ($ p + 1 $) would be the final place before taking begins. What is the expected outcome of the game if the$ p $-th action is the final place? There are$ p $dollars in total, and$ 100 - p $takes remaining. Our expected payoff will be$ p \times \sum_{i=1}^{100 - p} \frac{1}{2^i} $, which can be seen from the linearity of expectation - at every take, we will take half of what is remaining on average. What is the expected value of the game if action$ p + 1 $is the last place? There are$ p + 1 $dollars in aggregate and we have$ 100 - (p + 1) $takes. Thus, our expected payoff is$ (p + 1) \times \sum_{i=1}^{100 - p - 1} \frac{1}{2^i} $. We are looking for the largest value of$ p $such that the payoff where the$ p $-th place is the last place is greater than the payoff where the incremental$ p + 1$-th place is the last place. Thus:

p \sum_{i=1}^{100 - p} \frac{1}{2^i} > (p + 1) \sum_{i=1}^{100 - p - 1} \frac{1}{2^i}

and

\frac{p}{2^{100 - p}} > \sum_{i=1}^{99 - p} \frac{1}{2^i}

approaches 1, and thus $p > 2^{100 - p} \Rightarrow p > 93$ . With 94 places, the expected payoff is

94 \sum_{i=1}^{6} \frac{1}{2^i} \approx 92.5

To check this is the maximum payoff, we can look at the 93 and 95 placing cases. With 93 places, the expected payoff is

93 \sum_{i=1}^{7} \frac{1}{2^i} \approx 92.3

With 95 places, the expected payoff is

95 \sum_{i=1}^{5} \frac{1}{2^i} \approx 92.0

Both surrounding values are less than our expected payoff, and hence the optimal number of times we should place is 94 with an expected payoff of

\frac{2961}{32} \approx 92.5