返回题库

抽牌红加 1 黑减 1

Dynamic Card Game

专题
Probability / 概率
难度
L4

题目详情

一副牌 52 张:红 26、黑 26。洗牌后不放回依次抽牌,你可以在任意时刻停止。

  • 抽到红牌:收益 +1;
  • 抽到黑牌:收益 -1。

问:最大化期望收益的最优停规则是什么?这个游戏的期望价值是多少?

A deck has 52 cards: 26 red and 26 black. The deck is shuffled, and cards are drawn one by one (without replacement). You can stop at any time. For each red card drawn, you gain +1; for each black card, you lose -1. What is the optimal stopping rule to maximize expected payoff, and how much is the game worth (in expectation)?

解析

用动态规划。

设状态 (b,r)(b,r) 为剩余黑牌数 bb 与红牌数 rr

立即停下的“价值”为 brb-r(可从对称性理解:你的当前得分 + 未见牌的红黑差为常数)。

若继续抽:下一张为黑的概率 bb+r\frac{b}{b+r}(转到 (b1,r)(b-1,r)),为红的概率 rb+r\frac{r}{b+r}(转到 (b,r1)(b,r-1))。令 f(b,r)f(b,r) 为最优期望收益,则

f(b,r)=max{br,bb+rf(b1,r)+rb+rf(b,r1)}.f(b,r)=\max\Bigl\{b-r,\frac{b}{b+r}f(b-1,r)+\frac{r}{b+r}f(b,r-1)\Bigr\}.

边界:f(0,r)=0f(0,r)=0f(b,0)=bf(b,0)=b

计算可得 f(26,26)2.62f(26,26)\approx 2.62,因此游戏价值约为 2.62 美元


Original Explanation

Let (b,r)(b,r) represent the count of black and red cards remaining in the deck. By symmetry, the net from stopping immediately is brb-r (because the difference between black and red among the unseen cards is brb-r but your “score so far” + “remaining color difference” is constant).

If you continue, the next card is black with probability bb+r\frac{b}{b+r} (transition to (b1,r)(b-1,r)) or red with probability rb+r\frac{r}{b+r} (transition to (b,r1)(b,r-1)). Let f(b,r)f(b,r) be the maximum expected payoff. Then f(b,r)=max{br,  bb+rf(b1,r)+rb+rf(b,r1)},f(b,r) = \max\Bigl\{ b-r,\; \frac{b}{b+r}f(b-1,r) + \frac{r}{b+r}f(b,r-1) \Bigr\}, with boundary conditions f(0,r)=0f(0,r)=0 and f(b,0)=bf(b,0)=b. By dynamic programming, one finds f(26,26)2.62f(26,26)\approx 2.62. So the game is worth about $2.62.