返回题库

放或拿

Place or Take

专题
Strategy / 策略
难度
L6

题目详情

你有两个不透明盒子。每一步必须选择:

  • place:第三方往随机一个盒子里放入 1 元;
  • take:随机清空一个盒子,并把该盒子里的钱全部拿走。

共 100 步。你在游戏过程中看不到每次 take 具体拿了多少,只在最后结算。

问:最优策略下期望收益是多少?

You are playing a one-player game with two opaque boxes. At each turn, you can choose to either "place" or "take". "Place" places $1 from a third party into one box randomly. "Take" empties out one box randomly and that money is yours. This game consists of 100 turns where you must either place or take. Assuming optimal play, what is the expected payoff of this game? Note that you do not know how much money you have taken until the end of the game.

解析

最优策略是把所有 place 放在前面、take 放在后面(place 放在 take 之后只会降低期望可回收金额)。

若先做 pp 次 place,则盒子里总金额为 pp;再做 m=100pm=100-p 次 take。

每次 take 的期望能拿走当前总金额的一半,因此 mm 次 take 的期望回收比例为

i=1m12i=12m.\sum_{i=1}^{m}\frac{1}{2^i}=1-2^{-m}.

所以期望收益为

G(p)=p(12(100p)).G(p)=p\left(1-2^{-(100-p)}\right).

比较 pp 可得最优为 p=94p=94(即 94 次 place 后接 6 次 take),此时期望为

94(126)=946364=29613292.5313.94\left(1-2^{-6}\right)=94\cdot\frac{63}{64}=\frac{2961}{32}\approx 92.5313.

Original Explanation

There are two key elements to this game that should be considered: the sequence of places and takes, and the total number of places and takes. We'll begin by addressing the first element. Intuitively, no place actions should follow a take. Greedily, it makes sense to place a take after all places to maximize the expected value of that take (since each place action after a take reduces the potential payout by 1);therefore,alltakesshouldbeclusteredattheendofthe100turns.Thiscanalsobeprovenusinginduction,butwellleavethatasanexerciseforthereader.Now,letsconsiderthesecondelement:thenumberofplacesandtakes.Ifthereare1); therefore, all takes should be clustered at the end of the 100 turns. This can also be proven using induction, but we'll leave that as an exercise for the reader. Now, let's consider the second element: the number of places and takes. If there arepplaceactions,therewillbeplace actions, there will be100 - ptakes,andthetakes, and thepplaceactionsmusthappenbeforetheplace actions must happen before the100 - ptakes.Todeterminetheoptimalvalueoftakes. To determine the optimal value ofp,wecanassessthepointatwhichaddingonemoreplaceactionnolongerincreasestheexpectedpayout.Lookatthe, we can assess the point at which adding one more place action no longer increases the expected payout. Look at thepthactionwhere-th action where1 < p < 100andassumethisisthelastplacebeforewestarttaking,orthenextaction(and assume this is the last place before we start taking, or the next action (p + 1)wouldbethefinalplacebeforetakingbegins.Whatistheexpectedoutcomeofthegameifthe) would be the final place before taking begins. What is the expected outcome of the game if thepthactionisthefinalplace?Thereare-th action is the final place? There arepdollarsintotal,anddollars in total, and100 - ptakesremaining.Ourexpectedpayoffwillbetakes remaining. Our expected payoff will bep \times \sum_{i=1}^{100 - p} \frac{1}{2^i},whichcanbeseenfromthelinearityofexpectationateverytake,wewilltakehalfofwhatisremainingonaverage.Whatistheexpectedvalueofthegameifaction, which can be seen from the linearity of expectation - at every take, we will take half of what is remaining on average. What is the expected value of the game if actionp + 1isthelastplace?Thereareis the last place? There arep + 1dollarsinaggregateandwehavedollars in aggregate and we have100 - (p + 1)takes.Thus,ourexpectedpayoffistakes. Thus, our expected payoff is(p + 1) \times \sum_{i=1}^{100 - p - 1} \frac{1}{2^i}.Wearelookingforthelargestvalueof. We are looking for the largest value ofpsuchthatthepayoffwherethesuch that the payoff where thepthplaceisthelastplaceisgreaterthanthepayoffwheretheincremental-th place is the last place is greater than the payoff where the incrementalp + 1$-th place is the last place. Thus:

pi=1100p12i>(p+1)i=1100p112ip \sum_{i=1}^{100 - p} \frac{1}{2^i} > (p + 1) \sum_{i=1}^{100 - p - 1} \frac{1}{2^i}

and

p2100p>i=199p12i\frac{p}{2^{100 - p}} > \sum_{i=1}^{99 - p} \frac{1}{2^i}

approaches 1, and thus p>2100pp>93p > 2^{100 - p} \Rightarrow p > 93. With 94 places, the expected payoff is

94i=1612i92.594 \sum_{i=1}^{6} \frac{1}{2^i} \approx 92.5

To check this is the maximum payoff, we can look at the 93 and 95 placing cases. With 93 places, the expected payoff is

93i=1712i92.393 \sum_{i=1}^{7} \frac{1}{2^i} \approx 92.3

With 95 places, the expected payoff is

95i=1512i92.095 \sum_{i=1}^{5} \frac{1}{2^i} \approx 92.0

Both surrounding values are less than our expected payoff, and hence the optimal number of times we should place is 94 with an expected payoff of

29613292.5\frac{2961}{32} \approx 92.5