返回题库

样本量 vs 信号强度:谁的证据更强

Sample Size vs. Signal Strength

专题
Statistics / 统计
难度
L4

题目详情

An urn contains six balls - - three red and three blue. One of these balls let us call it ball A - - is selected at random and permanently removed from the urn without the color of this ball being shown to an observer. This observer may now draw successively - - at random and with replacement a number of individual balls (one at a time) from among the five remaining balls, so as to form a noisy impression about the ratio of red vs. blue balls that remained in the urn after A was removed.

Peter may draw a ball six times, and each time the ball he draws turns out to be red. Paula may draw a ball 600 times; 303 times she draws a red ball, and 297 times a blue ball. Clearly, both will tend to predict that ball A was probably blue. Which of them- if either- has the stronger empirical evidence for his/her prediction?

解析

两人的证据强度相同。

若 A 被拿走的是蓝球,则剩余为 3 红 2 蓝,每次抽红概率 3/53/5;若拿走红球,则每次抽红概率 2/52/5

观测到 rr 次红、bb 次蓝(放回抽样,独立)时,似然比为

(3/5)r(2/5)b(2/5)r(3/5)b=(32)rb,\frac{(3/5)^r(2/5)^b}{(2/5)^r(3/5)^b}=\left(\frac{3}{2}\right)^{r-b},

只依赖差值 rbr-b

Peter:rb=60=6r-b=6-0=6;Paula:rb=303297=6r-b=303-297=6,因此证据完全一样。

若先验 P(A为蓝)=1/2\mathbb{P}(A\text{为蓝})=1/2,则后验 odds 为 (3/2)6=729/64(3/2)^6=729/64,所以

P(A为蓝数据)=729729+64=7297930.919.\boxed{\mathbb{P}(A\text{为蓝}\mid\text{数据})=\frac{729}{729+64}=\frac{729}{793}\approx 0.919}.