掷或拿 I

Take and Roll I

专题: Probability / 概率
难度: L4
来源: QuantQuestion

题目详情

你有一枚公平 20 面骰与 100 次动作。骰子起始朝上为 1。

每次动作你可以选择：

掷（roll）：重掷当前朝上点数；
拿（take）：把当前朝上点数兑现为收益（但游戏不会结束，后续仍可继续 take 或 roll；例如你可以对初始 1 连续 take 100 次获得 100）。

你的策略是：在游戏开始前选定一个阈值 $n$ ，当且仅当你第一次掷到点数 $\ge n$ 时开始一直 take（在此之后不再 roll）。

在理性选择 $n$ 的情况下，期望收益最大是多少？

You are given a fair 20-sided die and 100 actions in a game. The die starts with upface $1$ . The two options you can perform are to roll and to take. Performing a roll re-rolls the current upface of the die. Performing a take allows you to cash out the current upface of the die. Note that the game does not end when you perform a take and that you do not have to roll between takes. Therefore, for example, you can just perform 100 takes on the initial $1$ upface and walk away with $100$ guaranteed. Your strategy is to cash out the upface when you roll at least some threshold $n$ for the first time. You fix this $n$ at the beginning of the game. Assuming rational strategy in selecting $n$ , what is your expected payout on this game?

解析

若阈值为 $x$ ：

一旦掷到 $\ge x$ ，其条件期望为 $\frac{20+x}{2}$ 。
每次 roll 命中 $\ge x$ 的概率为 $\frac{21-x}{20}$ ，因此命中所需 roll 次数期望为 $\frac{20}{21-x}$ 。
于是可用于 take 的动作数期望约为 $100-\frac{20}{21-x}$ 。

期望收益函数为

p(x)=\frac{20+x}{2}\left(100-\frac{20}{21-x}\right).

比较整数 $x$ 可得最优为 $x=18$ ，最大期望收益为

\frac{5320}{3}.

Original Explanation

If you take when you have at least $x$ for the first time, your expected face showing is $\dfrac{20+x}{2}$ . Since there are $21-x$ values on the die at least $x$ , the probability on any given roll of seeing at least $x$ is $\dfrac{21-x}{20}$ . So, the average number of rolls needed to see a value at least $x$ is $\dfrac{20}{21-x}$ . Therefore, you are able to claim on $100 - \dfrac{20}{21-x}$ turns. Hence, your expected payout would be

p(x) = \dfrac{20+x}{2} \cdot \left(100 - \dfrac{20}{21-x} \right)

To find the maximum, one can treat $p(x)$ as continuous and differentiate in $x$ and then consider the two integers $x$ is between as potential maximizers.

Doing this by product rule and simplifying, $p'(x) = \dfrac{10}{(x-21)^2} \cdot (5x^2 - 210x + 2164)$ . The zeros of this polynomial, by quadratic formula, are $x^* = \dfrac{105 \pm \sqrt{205}}{5}$ . The root where we add is larger than 20, so $x^* = \dfrac{105 - \sqrt{205}}{5} \approx 18.137$ is our optimizer.

In an interview, one could notice that $14^2 < 205 < 15^2$ , so that's how one could deduce $18 < x^* < 19$ . Lastly, plugging in $x = 18$ and $x = 19$ yields that $x = 18$ yields the optimal value with $\dfrac{5320}{3}$ .