30-20 游戏

30-20 Game

专题: Probability / 概率
难度: L5
来源: OpenQuant

第 1 小问

题目详情

菲利普和布兰登这两个玩家分别拥有 30 面和 20 面的骰子。每个玩家掷骰子，角色最高的玩家获胜（如果平局，布兰登也获胜）。游戏的失败者向获胜者支付相当于获胜者掷骰子价值的金额。

第 1 部分

菲利普的预期收益是多少？

Two players, Philip and Brandon, have a 30-side and 20-side dice, respectively. Each player rolls their dice and the player with the highest role wins (Brandon also wins in the event of a tie). The loser of the game pays the winner an amount equivalent to the value of the winner's dice roll.

Part 1

What is the expected value for the payoff of Philip?

解析

菲利普将以 $\frac23$ 的概率在 1-20 之间滚动。在这些情况下，当他平局时，飞利浦的预期收益为负，而其他地方则为零（因为布兰登击败菲利普和菲利普击败布兰登的结果平均为零）。鉴于角色介于 1-20 之间，Phillip 和 Brandon 的概率为 $\frac1{20}$ 。平局值的平均值为 $10.5$

在 $\frac13$ 的情况下，菲利普投掷的点数超过 20，他每次都会将这个值放入口袋，平均为 $25.5$ 。因此， $E[P] = \frac23 \cdot \frac1{20}(-10.5) + \frac13 (25.5) = \boxed{8.15}$

import random
from typing import Callable

earnings = []
roll_dice: Callable[[int], int] = lambda num_sides: random.randrange(1, num_sides+1)

for i in range(100000):
    philip_roll = roll_dice(30)
    brandon_roll = roll_dice(20)

    if philip_roll > brandon_roll:
        earnings.append(philip_roll)
    else:
        earnings.append(-brandon_roll)

#expecting close to 8.15
print(sum(earnings)/len(earnings))

Original Explanation

Philip will roll betweeen 1-20 with probability $\frac23$ . In these instances, Philips expected earnings are negative when he ties and zero elsewhere (as the outcomes that Brandon beats Philip and Philip beats Brandon will average out to zero). Given the role is between 1-20, Phillip and Brandon will tie with probability $\frac1{20}$ . The average of the tie values is $10.5$

In the $\frac13$ cases where Philip rolls greater than 20, he will pocket that value every single time which is $25.5$ on average. Therefore,

E[P] = \frac23 \cdot \frac1{20}(-10.5) + \frac13 (25.5) = \boxed{8.15}

import random
from typing import Callable

earnings = []
roll_dice: Callable[[int], int] = lambda num_sides: random.randrange(1, num_sides+1)

for i in range(100000):
    philip_roll = roll_dice(30)
    brandon_roll = roll_dice(20)

    if philip_roll > brandon_roll:
        earnings.append(philip_roll)
    else:
        earnings.append(-brandon_roll)

#expecting close to 8.15
print(sum(earnings)/len(earnings))

第 2 小问

题目详情

菲利普和布兰登这两个玩家分别拥有 30 面和 20 面的骰子。每个玩家都掷骰子，掷骰子最高的玩家获胜（如果平局，布兰登也获胜）。游戏的失败者向获胜者支付相当于获胜者掷骰子价值的金额。

第 2 部分

当布兰登可以在菲利普的骰子揭晓之前重新掷骰子时，菲利普的游戏期望值会改变多少？

Two players, Philip and Brandon, have a 30-side and 20-side dice, respectively. Each player rolls their dice and the player with the highest roll wins (Brandon also wins in the event of a tie). The loser of the game pays the winner an amount equivalent to the value of the winner's dice roll.

Part 2

How much does the expected value of the game change for Philip when Brandon can re-roll the dice before Philip's dice is unveiled?

解析

假设布兰登可以选择重新投掷。仅当重新滚动会降低菲利普的预期收益时，他才会行使此选择权。我们知道布兰登掷骰子后菲利普的预期收益是 $8.15$ ，因此考虑到布兰登当前的角色，如果菲利普的预期收益大于 $8.15$ ，布兰登就会重新掷骰子。为了解决这个问题，让我们计算一下在布兰登的初始滚动情况下菲利普的预期收入是多少。

import random
from typing import Callable, Tuple

random.seed(42)

roll_dice: Callable[[int], int] = lambda num_sides: random.randrange(1, num_sides + 1)

def simulate_strategy(strat: int, iterations: int = 10_000) -> float:
    brandon_evs = []

    for _ in range(iterations):
        philip_roll = roll_dice(30)
        brandon_roll = roll_dice(20)

        if brandon_roll <= strat:
            brandon_roll = roll_dice(20)

        brandon_evs.append(
            brandon_roll if brandon_roll >= philip_roll else -philip_roll
        )

    return sum(brandon_evs) / len(brandon_evs)

def find_brandon_optimal_strat() -> Tuple[int, float]:
    strategies = list(range(1, 20))
    optimal_strategy, optimal_strategy_ev = None, float("-inf")

    for strat in strategies:
        brandon_ev = simulate_strategy(strat)
        if brandon_ev > optimal_strategy_ev:
            optimal_strategy = strat
            optimal_strategy_ev = brandon_ev

    return (optimal_strategy, optimal_strategy_ev)

brandon_optimal_strat, brandon_optimal_strat_ev = find_brandon_optimal_strat()
print(
    f"Brandon's strategy is to reroll if he gets <= {brandon_optimal_strat} on first roll.",
    f"Philips's expected earnings when Brandon uses the optimal strategy is {-1 * brandon_optimal_strat_ev}",
)

Original Explanation

Suppose Brandon has the option to re-roll. He will only exercise this option if re-rolling will decrease Philip's expected earnings. We know Philip's expected earnings upon Brandon rolling a die is $8.15$ , so Brandon will re-roll if Philip's expected earnings are greater than $8.15$ given Brandon's current role. To tackle this problem, let's compute what Philip's expected earnings are given Brandon's initial roll.

import random
from typing import Callable, Tuple


random.seed(42)

roll_dice: Callable[[int], int] = lambda num_sides: random.randrange(1, num_sides + 1)


def simulate_strategy(strat: int, iterations: int = 10_000) -> float:
    brandon_evs = []

    for _ in range(iterations):
        philip_roll = roll_dice(30)
        brandon_roll = roll_dice(20)

        if brandon_roll <= strat:
            brandon_roll = roll_dice(20)

        brandon_evs.append(
            brandon_roll if brandon_roll >= philip_roll else -philip_roll
        )

    return sum(brandon_evs) / len(brandon_evs)


def find_brandon_optimal_strat() -> Tuple[int, float]:
    strategies = list(range(1, 20))
    optimal_strategy, optimal_strategy_ev = None, float("-inf")

    for strat in strategies:
        brandon_ev = simulate_strategy(strat)
        if brandon_ev > optimal_strategy_ev:
            optimal_strategy = strat
            optimal_strategy_ev = brandon_ev

    return (optimal_strategy, optimal_strategy_ev)


brandon_optimal_strat, brandon_optimal_strat_ev = find_brandon_optimal_strat()
print(
    f"Brandon's strategy is to reroll if he gets <= {brandon_optimal_strat} on first roll.",
    f"Philips's expected earnings when Brandon uses the optimal strategy is {-1 * brandon_optimal_strat_ev}",
)