混合：平均随机变量 vs 混合分布

Mixing RVs

专题: Probability / 概率
难度: L4
来源: QuantQuestion

题目详情

The concept of a "mixture distribution"is used in probability and its applications in at least two different ways that have quite different meanings. Let X and Y be two independent normal rvs, with means $\mu = 50$ and $\mu \mathrm{y} = 150$ and standard deviations of $\scriptstyle 0 = \mathrm{oy} = 10$ . Consider the rv A definedd by

\mathbf{A} = \frac{1}{2}\left(\mathbf{X} + \mathbf{Y}\right)

The idea here is that in each realization both a value of $\mathbf{X}$ and a value of Yare generated, and half of each is added to produce the value of A. Thus, in a fairly direct sense each individual realization of A contains (is made up of) one part of $\mathbf{X}$ and one part of Y- in this sense, A is a 50/50 mixture of $\mathbf{X}$ and Y, much like one mixes half a pound of butter and half a pound of flour. Next, consider an rv B that comes, in each realization, with probability from a normal distribution (namely, that of X) with mean $\mu = 50$ and stan- dard deviation 10 and with probability from a normal distribution (namely, that of Y) with mean $\mu = 150$ and standard deviation 10. Thus, its density is equal to

f_{B}(b) = \frac{1}{2}\left[n(b|\mu = 50,\sigma = 10) + n(b|\mu = 150,\sigma = 10)\right]

where $\mathfrak{n}(\mu ,\mathfrak{o})$ is the normal density with mean $\mu$ and standard deviation o. The idea here is that in each realization either a value of $\mathbf{X}$ or a value of $\mathbf{Y}$ is generated (but not of both) with equal probability, and that this valuethen determines that of B. However, across many realizations, the density of B will still represent a 50/50 mixture of $\mathbf{X}$ and Y. a. Sketch the densities of A and B. Is A normally distributed? Is B? b. Derive the means and variances of A and B. Compare and explain. c. Let X, Y be two arbitrary but independent rvs with densities fx, fy; means $\mu \mathrm{r},\mathrm{My}$ ; and standard deviations ox, Oy. Let $0\leq \mathrm{p}\leq 1$ be a proportion mixing either the rvs themselves:

\mathbf{A} = p\cdot \mathbf{X} + (1 - p)\cdot \mathbf{Y}

or their densities

f_{B}(b) = p\cdot f_{X}(b) + (1 - p)\cdot f_{Y}(b)

Derive for this more general case the means and variances of A and B.

解析

给定独立 $X\sim N(50,10^2)$ 、 $Y\sim N(150,10^2)$ 。

定义

A=\frac{X+Y}{2}.

因为正态线性组合仍为正态，所以 $A$ 正态：

\mathbb{E}[A]=\frac{50+150}{2}=100,

\mathrm{Var}(A)=\frac{1}{4}(\mathrm{Var}(X)+\mathrm{Var}(Y))=\frac{1}{4}(100+100)=50.

因此 $\boxed{A\sim N(100,50)}$ 。

再定义 $B$ ：以 $1/2$ 概率取 $X$ 的一次实现，以 $1/2$ 概率取 $Y$ 的一次实现（即混合分布）。

则 $B$ 的密度为两正态密度的平均（双峰），因此 $B$ 不是正态。

均值：

\mathbb{E}[B]=\frac12\cdot 50+\frac12\cdot 150=100.

方差用全方差公式：

\mathrm{Var}(B)=\mathbb{E}[\mathrm{Var}(B\mid I)]+\mathrm{Var}(\mathbb{E}[B\mid I]) =100+\mathrm{Var}(50\text{ 或 }150) =100+\frac12(50-100)^2+\frac12(150-100)^2 =100+2500=2600.

所以 $\boxed{\mathrm{Var}(B)=2600}$ ，远大于 $\mathrm{Var}(A)=50$ 。

一般权重 $p$ 情形：

$A=pX+(1-p)Y$ ： $\mathbb{E}[A]=p\mu_X+(1-p)\mu_Y$ ， $\mathrm{Var}(A)=p^2\sigma_X^2+(1-p)^2\sigma_Y^2$ （独立）。
$B$ 为“以概率 $p$ 取 $X$ 、以概率 $1-p$ 取 $Y$ ”的混合： $\mathbb{E}[B]=p\mu_X+(1-p)\mu_Y,$ $\mathrm{Var}(B)=p\sigma_X^2+(1-p)\sigma_Y^2+p(1-p)(\mu_X-\mu_Y)^2.$

区别： $A$ 是“每次都混合取值”， $B$ 是“在两个分布之间切换”。