返回题库

掷或拿 I

Take and Roll I

专题
Probability / 概率
难度
L4

题目详情

你有一枚公平 20 面骰与 100 次动作。骰子起始朝上为 1。

每次动作你可以选择:

  • 掷(roll):重掷当前朝上点数;
  • 拿(take):把当前朝上点数兑现为收益(但游戏不会结束,后续仍可继续 take 或 roll;例如你可以对初始 1 连续 take 100 次获得 100)。

你的策略是:在游戏开始前选定一个阈值 nn,当且仅当你第一次掷到点数 n\ge n 时开始一直 take(在此之后不再 roll)。

在理性选择 nn 的情况下,期望收益最大是多少?

You are given a fair 20-sided die and 100 actions in a game. The die starts with upface 11. The two options you can perform are to roll and to take. Performing a roll re-rolls the current upface of the die. Performing a take allows you to cash out the current upface of the die. Note that the game does not end when you perform a take and that you do not have to roll between takes. Therefore, for example, you can just perform 100 takes on the initial 11 upface and walk away with 100100 guaranteed. Your strategy is to cash out the upface when you roll at least some threshold nn for the first time. You fix this nn at the beginning of the game. Assuming rational strategy in selecting nn, what is your expected payout on this game?

解析

若阈值为 xx

  • 一旦掷到 x\ge x,其条件期望为 20+x2\frac{20+x}{2}
  • 每次 roll 命中 x\ge x 的概率为 21x20\frac{21-x}{20},因此命中所需 roll 次数期望为 2021x\frac{20}{21-x}
  • 于是可用于 take 的动作数期望约为 1002021x100-\frac{20}{21-x}

期望收益函数为

p(x)=20+x2(1002021x).p(x)=\frac{20+x}{2}\left(100-\frac{20}{21-x}\right).

比较整数 xx 可得最优为 x=18x=18,最大期望收益为

53203.\frac{5320}{3}.

Original Explanation

If you take when you have at least xx for the first time, your expected face showing is 20+x2\dfrac{20+x}{2}. Since there are 21x21-x values on the die at least xx, the probability on any given roll of seeing at least xx is 21x20\dfrac{21-x}{20}. So, the average number of rolls needed to see a value at least xx is 2021x\dfrac{20}{21-x}. Therefore, you are able to claim on 1002021x100 - \dfrac{20}{21-x} turns. Hence, your expected payout would be

p(x)=20+x2(1002021x)p(x) = \dfrac{20+x}{2} \cdot \left(100 - \dfrac{20}{21-x} \right)

To find the maximum, one can treat p(x)p(x) as continuous and differentiate in xx and then consider the two integers xx is between as potential maximizers.

Doing this by product rule and simplifying, p(x)=10(x21)2(5x2210x+2164)p'(x) = \dfrac{10}{(x-21)^2} \cdot (5x^2 - 210x + 2164). The zeros of this polynomial, by quadratic formula, are x=105±2055x^* = \dfrac{105 \pm \sqrt{205}}{5}. The root where we add is larger than 20, so x=105205518.137x^* = \dfrac{105 - \sqrt{205}}{5} \approx 18.137 is our optimizer.

In an interview, one could notice that 142<205<15214^2 < 205 < 15^2, so that's how one could deduce 18<x<1918 < x^* < 19. Lastly, plugging in x=18x = 18 and x=19x = 19 yields that x=18x = 18 yields the optimal value with 53203\dfrac{5320}{3}.