掷或拿 II
Take And Roll II
题目详情
你有 100 次动作与一枚公平 20 面骰(点数 1..20),初始朝上为 1。
每次动作你可以 roll(重掷)或 take(兑现当前点数)。但规则要求:每次 take 之后,下一次 take 之前必须至少 roll 一次。
你预先选定阈值 :只要 roll 出来点数 就立刻 take,否则继续 roll。
在最优选择 下,期望总收益是多少?
You are playing a game with a total of 100 actions and a fair 20-sided die. The die starts with face of 1. The two actions you can perform in the game are to roll and to take. Performing a roll re-rolls the current face of the die. Performing a take allows you to cash out the current face of the die. The game does ends when you complete all 100 actions and you must roll the die again before doing another take. Your strategy is to accept any number that is at least some threshold . This must be decided in advance and is fixed for the entire game. Assuming rational play in selecting , find your expected payout.
解析
设阈值为 。
- 每次 roll 命中 的概率为 ,命中所需 roll 次数期望为 。
- 一次“成功兑现循环”平均消耗动作数为 (roll 若干次 + 1 次 take)。
- 命中后点数的条件期望为 。
因此期望收益为
比较相邻整数可得最优阈值为 ,此时期望收益为
Original Explanation
First, let's get a baseline to compare to, the simplest strategy is to roll and always take the value that shows up. We will be able to perform this 50 times (as each is one action) and the expected value per roll is , so our expected payout would be with this strategy. Now, let's write the expected payoff as a function of . Given we accept any value at least , the expected value we roll given we accept is . Out of our 20 values there are values that are at least value (note: ), so the probability on each roll that we obtain a value at least is . This probability is constant between rolls as we have a fair die, meaning the expected number of rolls to obtain a value at least is . However, whenever we roll a value at least we also must claim it which takes up 1 action, so the expected number of turns needed to roll and claim the money is . Thus on average, we are able to roll and claim the money times in the game, as it takes us that many turns on average to roll and claim and we have 100 total turns. Lastly, this implies our expected payout is
To find the maximizing this, one can treat as continuous and use the derivative of it to find the optimal . The details of taking the derivative messy and not enlightening, so the steps are excluded. However, after using the basic rules and simplifying,
The roots of the polynomial are
The root adding is larger than 20, so
must be the maximizer. As and our optimal must be an integer, we can test to see which gives us the largest expected payout.
Plugging all three of these in reveals
maximizes with payout .