返回 Guide 目录

概率与统计 / Probability & Statistics

概率分布

Probability Distributions

本页结构

核心概念

  • 概率质量函数、密度函数、分布函数、期望与方差 PMF, PDF, CDF, expectation and variance
  • 矩母函数以及独立随机变量和的处理 Moment generating functions and sums of independent variables
  • 伯努利、二项、泊松、指数、均匀、正态与对数正态分布 Bernoulli, Binomial, Poisson, Exponential, Uniform, Normal and Lognormal distributions

学习顺序

  1. 先区分离散随机变量和连续随机变量,再选择公式。 Separate discrete and continuous random variables before applying formulas.
  2. 涉及阈值或尾部概率时优先从分布函数入手。 Use the CDF for tail probability and threshold questions.
  3. 理解为什么价格常用对数正态建模,而收益常用正态近似。 Remember why lognormal models prices while normal often models returns.

概览

Overview

Probability distributions provide a mathematical framework for modeling the uncertainty inherent in financial markets. They are essential for tasks ranging from asset pricing and risk management to portfolio optimization and algorithmic trading.

概率分布为金融市场固有的不确定性建模提供了一个数学框架。它们对于从资产定价和风险管理到投资组合优化和算法交易等任务至关重要。

一、基础概念

I. Foundational Concepts

A Random Variable (RV) is a variable whose value is a numerical outcome of a random phenomenon. RVs are classified as Discrete (countable outcomes, e.g., number of defaults) or Continuous (uncountable outcomes over a range, e.g., asset price).

随机变量 (RV) 是一个变量,其值是随机现象的数值结果。 RV 分为离散(可数结果,例如违约次数)或连续(一定范围内的不可数结果,例如资产价格)。

Concept Discrete RV Continuous RV Description
Probability Function Probability Mass Function (PMF), f(x)f(x) Probability Density Function (PDF), f(x)f(x) Defines the probability of a discrete outcome or the relative likelihood of a continuous outcome.
Cumulative Function Cumulative Distribution Function (CDF), F(x)F(x) Cumulative Distribution Function (CDF), F(x)F(x) Gives the probability that the RV takes a value less than or equal to xx: F(x)=P(Xx)F(x) = P(X \le x).
Expected Value E[X]=xif(xi)\mathbb{E}[X] = \sum x_i f(x_i) E[X]=xf(x)dx\mathbb{E}[X] = \int x f(x) dx The weighted average of all possible values, representing the long-run average.
Variance Var(X)=E[(Xμ)2]\text{Var}(X) = \mathbb{E}[(X - \mu)^2] Var(X)=E[(Xμ)2]\text{Var}(X) = \mathbb{E}[(X - \mu)^2] Measures the dispersion or spread of the distribution around the mean (μ\mu).
概念 离散随机变量 连续随机变量 描述
概率函数 概率质量函数 (PMF)f(x)f(x) 概率密度函数 (PDF)f(x)f(x) 定义离散结果的概率或连续结果的相对可能性。
累积函数 累积分布函数 (CDF)F(x)F(x) 累积分布函数 (CDF)F(x)F(x) 给出 RV 取小于或等于 xx 的值的概率:F(x)=P(Xx)F(x) = P(X \le x)
预期值 E[X]=xif(xi)\mathbb{E}[X] = \sum x_i f(x_i) E[X]=xf(x)dx\mathbb{E}[X] = \int x f(x) dx 所有可能值的加权平均值,代表长期平均值。
方差 Var(X)=E[(Xμ)2]\text{Var}(X) = \mathbb{E}[(X - \mu)^2] Var(X)=E[(Xμ)2]\text{Var}(X) = \mathbb{E}[(X - \mu)^2] 测量均值周围分布的离散度或扩展度 (μ\mu)。

Moment Generating Functions (MGF)

矩生成函数 (MGF)

The Moment Generating Function (MGF), MX(θ)=E[eθX]M_X(\theta) = \mathbb{E}[e^{\theta X}], is a powerful tool.

矩生成函数 (MGF)MX(θ)=E[eθX]M_X(\theta) = \mathbb{E}[e^{\theta X}],是一个强大的工具。

  • Utility: The kk-th moment of the distribution (E[Xk]\mathbb{E}[X^k]) can be found by taking the kk-th derivative of the MGF and evaluating it at θ=0\theta=0.
  • Sum of RVs: The MGF of the sum of independent random variables is the product of their individual MGFs: MX+Y(θ)=MX(θ)MY(θ)M_{X+Y}(\theta) = M_X(\theta) M_Y(\theta).
  • 用途: 分布的 kk 矩 (E[Xk]\mathbb{E}[X^k]) 可以通过获取 MGF 的 kk 阶导数并在 θ=0\theta=0 处对其进行评估来找到。
  • 随机变量之和: 独立随机变量总和的 MGF 是其各个 MGF 的乘积:MX+Y(θ)=MX(θ)MY(θ)M_{X+Y}(\theta) = M_X(\theta) M_Y(\theta)

二、统计中的关键分布

II. Key Distributions in Statistics

The following table summarizes the most critical distributions, their parameters, and their relevance in financial modeling.

下表总结了最关键的分布、它们的参数以及它们在金融建模中的相关性。

Name Type Application PMF/PDF μ\mu σ2\sigma^2
Bernoulli Discrete Modeling a single event outcome (e.g., default/no default, success/failure of a trade). f(t;p)=pt(1p)1tf(t;p) = p^t (1-p)^{1-t} pp p(1p)p(1-p)
Binomial Discrete Number of successes in a fixed number of trials (e.g., number of up-moves in a Binomial Option Pricing Model, credit risk modeling). f(t;n,p)=(nt)pt(1p)ntf(t;n,p) = \binom{n}{t} p^t (1-p)^{n-t} npnp np(1p)np(1-p)
Poisson Discrete Modeling the number of rare events over a fixed time (e.g., number of trades, defaults, or jumps in a jump-diffusion model). f(t;λ)=λteλt!f(t;\lambda) = \frac{\lambda^t e^{-\lambda}}{t!} λ\lambda λ\lambda
Exponential Continuous Modeling the time until the next event in a Poisson process (e.g., time until default or time between trades). f(t;λ)=λeλt1t0f(t;\lambda) = \lambda e^{-\lambda t} \mathbf{1}_{t \ge 0} 1λ\frac{1}{\lambda} 1λ2\frac{1}{\lambda^2}
Uniform Continuous Modeling uncertainty when all outcomes are equally likely (e.g., random number generation, simple Monte Carlo simulations). f(t;a,b)=1ba1t[a,b]f(t;a,b) = \frac{1}{b-a} \mathbf{1}_{t \in [a,b]} a+b2\frac{a+b}{2} (ba)212\frac{(b-a)^2}{12}
Normal Continuous The distribution for modeling asset returns (log-returns) due to the CLT. Used in Markowitz portfolio theory and basic risk models. f(t)=1σ2πexp((xμ)22σ2)f(t) = \frac{1}{\sigma\sqrt{2\pi}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) μ\mu σ2\sigma^2
Lognormal Continuous The distribution for modeling asset prices in the Black-Scholes-Merton model, as prices cannot be negative. If XN(μ,σ2)X \sim N(\mu, \sigma^2), then Y=eXLognormalY = e^X \sim \text{Lognormal}. f(y)=1yσ2πexp((lnyμ)22σ2)f(y) = \frac{1}{y\sigma\sqrt{2\pi}} \exp\left(-\frac{(\ln y-\mu)^2}{2\sigma^2}\right) eμ+σ2/2e^{\mu + \sigma^2/2} e2μ+σ2(eσ21)e^{2\mu + \sigma^2}(e^{\sigma^2}-1)
Student's t Continuous Used to model financial returns with heavy tails (fat tails), capturing extreme events more accurately than the Normal distribution. Parameter ν\nu (degrees of freedom) controls tail thickness. f(t;ν)(1+t2ν)ν+12f(t;\nu) \propto \left(1 + \frac{t^2}{\nu}\right)^{-\frac{\nu+1}{2}} 0 (for ν>1\nu>1) νν2\frac{\nu}{\nu-2} (for ν>2\nu>2)
名称 类型 应用 PMF/PDF μ\mu σ2\sigma^2
伯努利 离散型 对单个事件结果进行建模(例如,违约/无违约、交易成功/失败)。 f(t;p)=pt(1p)1tf(t;p) = p^t (1-p)^{1-t} pp p(1p)p(1-p)
二项式 离散型 固定次数试验中的成功次数(例如,二项式期权定价模型、信用风险建模中的上升次数)。 f(t;n,p)=(nt)pt(1p)ntf(t;n,p) = \binom{n}{t} p^t (1-p)^{n-t} npnp np(1p)np(1-p)
泊松 离散型 对固定时间内罕见事件的数量进行建模(例如,交易数量、违约数量或跳跃扩散模型中的跳跃数量)。 f(t;λ)=λteλt!f(t;\lambda) = \frac{\lambda^t e^{-\lambda}}{t!} λ\lambda λ\lambda
指数 连续型 对泊松过程中下一个事件之前的时间进行建模(例如,违约之前的时间或交易之间的时间)。 f(t;λ)=λeλt1t0f(t;\lambda) = \lambda e^{-\lambda t} \mathbf{1}_{t \ge 0} 1λ\frac{1}{\lambda} 1λ2\frac{1}{\lambda^2}
均匀分布 连续型 当所有结果都同样可能时对不确定性进行建模(例如,随机数生成、简单的蒙特卡罗模拟)。 f(t;a,b)=1ba1t[a,b]f(t;a,b) = \frac{1}{b-a} \mathbf{1}_{t \in [a,b]} a+b2\frac{a+b}{2} (ba)212\frac{(b-a)^2}{12}
正态分布 连续型 CLT 导致的资产回报(对数回报)建模的分布。用于马科维茨投资组合理论和基本风险模型。 f(t)=1σ2πexp((xμ)22σ2)f(t) = \frac{1}{\sigma\sqrt{2\pi}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) μ\mu σ2\sigma^2
对数正态 连续型 The distribution for modeling asset prices in the Black-Scholes-Merton model, as prices cannot be negative.如果 XN(μ,σ2)X \sim N(\mu, \sigma^2),则 Y=eXLognormalY = e^X \sim \text{Lognormal} f(y)=1yσ2πexp((lnyμ)22σ2)f(y) = \frac{1}{y\sigma\sqrt{2\pi}} \exp\left(-\frac{(\ln y-\mu)^2}{2\sigma^2}\right) eμ+σ2/2e^{\mu + \sigma^2/2} e2μ+σ2(eσ21)e^{2\mu + \sigma^2}(e^{\sigma^2}-1)
Student t 分布 连续型 用于使用重尾(肥尾)对金融回报进行建模,比正态分布更准确地捕获极端事件。参数 ν\nu(自由度)控制尾部厚度。 f(t;ν)(1+t2ν)ν+12f(t;\nu) \propto \left(1 + \frac{t^2}{\nu}\right)^{-\frac{\nu+1}{2}} 0(对于 ν>1\nu>1 νν2\frac{\nu}{\nu-2} (for ν>2\nu>2)

补充讲解

先确定建模对象

Model the object first

选择分布前,先判断随机对象是计数、等待时间、收益率,还是严格为正的价格。很多公式错误不是分布本身错,而是把分布用在了错误对象上。

Before choosing a distribution, identify whether the random object is a count, a waiting time, a return, or a strictly positive price. Most formula mistakes come from using the right distribution for the wrong object.

尾部问题优先看分布函数

Tail questions need CDFs

面试里涉及亏损阈值、回撤、违约和极端值的问题,通常先化成尾部概率。先写成分布函数或生存函数,再处理密度函数。

Interview questions about loss thresholds, drawdowns, defaults, and extremes usually reduce to a tail probability. Translate them into CDF or survival-function language before manipulating densities.

收益和价格不要混用

Returns and prices differ

正态近似通常用于可加的收益或对数收益;对数正态用于正的价格水平。这个区别贯穿 Black-Scholes、风险聚合和模拟结果校验。

Normal approximations are usually for additive returns or log-returns; lognormal models are for positive levels. This distinction is central to Black-Scholes, risk aggregation, and simulation sanity checks.