返回题库

混合:平均随机变量 vs 混合分布

Mixing RVs

专题
Probability / 概率
难度
L4

题目详情

The concept of a "mixture distribution"is used in probability and its applications in at least two different ways that have quite different meanings. Let X and Y be two independent normal rvs, with means μ=50\mu = 50 and μy=150\mu \mathrm{y} = 150 and standard deviations of 0=oy=10\scriptstyle 0 = \mathrm{oy} = 10 . Consider the rv A definedd by

A=12(X+Y)\mathbf{A} = \frac{1}{2}\left(\mathbf{X} + \mathbf{Y}\right)

The idea here is that in each realization both a value of X\mathbf{X} and a value of Yare generated, and half of each is added to produce the value of A. Thus, in a fairly direct sense each individual realization of A contains (is made up of) one part of X\mathbf{X} and one part of Y- in this sense, A is a 50/50 mixture of X\mathbf{X} and Y, much like one mixes half a pound of butter and half a pound of flour. Next, consider an rv B that comes, in each realization, with probability from a normal distribution (namely, that of X) with mean μ=50\mu = 50 and stan- dard deviation 10 and with probability from a normal distribution (namely, that of Y) with mean μ=150\mu = 150 and standard deviation 10. Thus, its density is equal to

fB(b)=12[n(bμ=50,σ=10)+n(bμ=150,σ=10)]f_{B}(b) = \frac{1}{2}\left[n(b|\mu = 50,\sigma = 10) + n(b|\mu = 150,\sigma = 10)\right]

where n(μ,o)\mathfrak{n}(\mu ,\mathfrak{o}) is the normal density with mean μ\mu and standard deviation o. The idea here is that in each realization either a value of X\mathbf{X} or a value of Y\mathbf{Y} is generated (but not of both) with equal probability, and that this valuethen determines that of B. However, across many realizations, the density of B will still represent a 50/50 mixture of X\mathbf{X} and Y. a. Sketch the densities of A and B. Is A normally distributed? Is B? b. Derive the means and variances of A and B. Compare and explain. c. Let X, Y be two arbitrary but independent rvs with densities fx, fy; means μr,My\mu \mathrm{r},\mathrm{My} ; and standard deviations ox, Oy. Let 0p10\leq \mathrm{p}\leq 1 be a proportion mixing either the rvs themselves:

A=pX+(1p)Y\mathbf{A} = p\cdot \mathbf{X} + (1 - p)\cdot \mathbf{Y}

or their densities

fB(b)=pfX(b)+(1p)fY(b)f_{B}(b) = p\cdot f_{X}(b) + (1 - p)\cdot f_{Y}(b)

Derive for this more general case the means and variances of A and B.

解析

给定独立 XN(50,102)X\sim N(50,10^2)YN(150,102)Y\sim N(150,10^2)

定义

A=X+Y2.A=\frac{X+Y}{2}.

因为正态线性组合仍为正态,所以 AA 正态:

E[A]=50+1502=100,\mathbb{E}[A]=\frac{50+150}{2}=100, Var(A)=14(Var(X)+Var(Y))=14(100+100)=50.\mathrm{Var}(A)=\frac{1}{4}(\mathrm{Var}(X)+\mathrm{Var}(Y))=\frac{1}{4}(100+100)=50.

因此 AN(100,50)\boxed{A\sim N(100,50)}

再定义 BB:以 1/21/2 概率取 XX 的一次实现,以 1/21/2 概率取 YY 的一次实现(即混合分布)。

BB 的密度为两正态密度的平均(双峰),因此 BB 不是正态。

均值:

E[B]=1250+12150=100.\mathbb{E}[B]=\frac12\cdot 50+\frac12\cdot 150=100.

方差用全方差公式:

Var(B)=E[Var(BI)]+Var(E[BI])=100+Var(50 或 150)=100+12(50100)2+12(150100)2=100+2500=2600.\mathrm{Var}(B)=\mathbb{E}[\mathrm{Var}(B\mid I)]+\mathrm{Var}(\mathbb{E}[B\mid I]) =100+\mathrm{Var}(50\text{ 或 }150) =100+\frac12(50-100)^2+\frac12(150-100)^2 =100+2500=2600.

所以 Var(B)=2600\boxed{\mathrm{Var}(B)=2600},远大于 Var(A)=50\mathrm{Var}(A)=50

一般权重 pp 情形:

  • A=pX+(1p)YA=pX+(1-p)YE[A]=pμX+(1p)μY\mathbb{E}[A]=p\mu_X+(1-p)\mu_YVar(A)=p2σX2+(1p)2σY2\mathrm{Var}(A)=p^2\sigma_X^2+(1-p)^2\sigma_Y^2(独立)。
  • BB 为“以概率 ppXX、以概率 1p1-pYY”的混合: E[B]=pμX+(1p)μY,\mathbb{E}[B]=p\mu_X+(1-p)\mu_Y, Var(B)=pσX2+(1p)σY2+p(1p)(μXμY)2.\mathrm{Var}(B)=p\sigma_X^2+(1-p)\sigma_Y^2+p(1-p)(\mu_X-\mu_Y)^2.

区别:AA 是“每次都混合取值”,BB 是“在两个分布之间切换”。