神经网络 / Neural Networks

本页结构

核心概念

神经元、层结构与激活函数 Neurons, layers and activation functions
损失函数、梯度下降与反向传播 Loss functions, gradient descent and backpropagation
正则化、过拟合和验证集实践 Regularization, overfitting and validation practice

学习顺序

分开解释前向传播和反向传播。 Explain the forward pass and backward pass separately.
把正则化选择和数据量、信号稳定性联系起来。 Tie regularization choices to data size and signal stability.
量化场景下要明确讨论数据泄漏和非平稳性。 For quant use cases, discuss leakage and non-stationarity explicitly.

Neural Networks (NNs) and Deep Learning (DL) represent a powerful class of non-linear models capable of learning complex patterns and representations directly from data. While historically less prevalent in finance due to their "black-box" nature and data requirements, they are increasingly used for tasks where non-linearity and high-dimensional data are key.

神经网络 (NN) 和深度学习 (DL) 代表了一类强大的非线性模型，能够直接从数据中学习复杂的模式和表示。虽然由于其“黑匣子”性质和数据要求，它们在金融领域历史上不太流行，但它们越来越多地用于非线性和高维数据为关键的任务。

The Neuron and the Network

神经元和网络

A neural network is a composition of simple, interconnected units called neurons or nodes, organized in layers.

神经网络是由称为“神经元”或“节点”的简单互连单元组成，按层组织。

Feedforward Pass: The output of a network is calculated by sequentially applying a linear transformation followed by a non-linear activation function $f(\cdot)$ at each layer.

前馈传递：网络的输出是通过在每一层顺序应用线性变换和非线性激活函数 $f(\cdot)$ 来计算的。

\mathbf{h}^{(l)} = f^{(l)}(\mathbf{W}^{(l)} \mathbf{h}^{(l-1)} + \mathbf{b}^{(l)})

where $\mathbf{h}^{(l)}$ is the output of layer $l$ , $\mathbf{W}^{(l)}$ are the weights, and $\mathbf{b}^{(l)}$ are the biases.

Universal Approximation Theorem: A feedforward network with a single hidden layer and a non-linear activation function can approximate any continuous function to an arbitrary degree of accuracy. This is the theoretical basis for their power.

其中 $\mathbf{h}^{(l)}$ 是 $l$ 层的输出， $\mathbf{W}^{(l)}$ 是权重， $\mathbf{b}^{(l)}$ 是偏差。

通用逼近定理：具有单个隐藏层和非线性激活函数的前馈网络可以以任意精度逼近任何连续函数。这是他们力量的理论基础。

Activation Functions

激活函数

Activation functions introduce the essential non-linearity that allows NNs to model complex relationships.

激活函数引入了基本的非线性，使神经网络能够对复杂的关系进行建模。

Function	Formula	Range	Use Case
Sigmoid	$\sigma(z) = \frac{1}{1 + e^{-z}}$	$(0, 1)$	Output layer for binary classification (probability). Suffers from vanishing gradients.
ReLU (Rectified Linear Unit)	$\text{ReLU}(z) = \max(0, z)$	$[0, \infty)$	Most common for hidden layers. Solves the vanishing gradient problem.
Softmax	$\frac{e^{z_i}}{\sum_j e^{z_j}}$	$(0, 1)$	Output layer for multi-class classification (probabilities sum to 1).
Tanh (Hyperbolic Tangent)	$\tanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}$	$(-1, 1)$	Hidden layers. Zero-centered, which is often preferred over Sigmoid.

功能	公式	范围	使用案例
乙状结肠	$\sigma(z) = \frac{1}{1 + e^{-z}}$	$(0, 1)$	二元分类（概率）的输出层。遭受梯度消失的困扰。
ReLU（修正线性单元）	$\text{ReLU}(z) = \max(0, z)$	$[0, \infty)$	最常见的是隐藏层。解决梯度消失问题。
Softmax	$\frac{e^{z_i}}{\sum_j e^{z_j}}$	$(0, 1)$	多类分类的输出层（概率总和为 1）。
Tanh（双曲正切）	$\tanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}$	$(-1, 1)$	隐藏层。以零为中心，通常优于 Sigmoid。

Loss Function and Optimization

损失函数和优化

Training involves minimizing a Loss Function (or Cost Function) $L(\mathbf{y}, \hat{\mathbf{y}})$ that measures the discrepancy between the network's prediction $\hat{\mathbf{y}}$ and the true value $\mathbf{y}$ .

训练涉及最小化 损失函数（或成本函数） $L(\mathbf{y}, \hat{\mathbf{y}})$ ，该函数测量网络预测 $\hat{\mathbf{y}}$ 与真实值 $\mathbf{y}$ 之间的差异。

Regression: Mean Squared Error (MSE).
Classification: Cross-Entropy Loss (or Log Loss).

回归：均方误差 (MSE)。
分类：交叉熵损失（或对数损失）。

Backpropagation and Gradient Descent

反向传播和梯度下降

The network's parameters ( $\mathbf{W}$ and $\mathbf{b}$ ) are updated iteratively using an optimization algorithm, typically a variant of Stochastic Gradient Descent (SGD).

网络参数（ $\mathbf{W}$ 和 $\mathbf{b}$ ）使用优化算法迭代更新，通常是随机梯度下降 (SGD) 的变体。

Gradient Descent: Updates parameters in the direction opposite to the gradient of the loss function.
Backpropagation: An efficient algorithm for computing the gradient of the loss function with respect to every weight in the network. It uses the chain rule of calculus to propagate the error signal backward from the output layer to the input layer.

梯度下降：以与损失函数梯度相反的方向更新参数。
反向传播：一种有效的算法，用于计算网络中每个权重的损失函数的梯度。它利用微积分的链式法则将误差信号从输出层向后传播到输入层。

Regularization and Overfitting

正则化和过度拟合

Due to the massive number of parameters, NNs are highly susceptible to overfitting.

由于参数数量庞大，神经网络很容易出现过度拟合。

Dropout: A regularization technique where randomly selected neurons are temporarily ignored during training. This prevents co-adaptation of neurons and forces the network to learn more robust features.
Early Stopping: Halting the training process when the performance on a separate validation set begins to degrade, even if the loss on the training set is still decreasing.

Dropout：一种正则化技术，在训练过程中暂时忽略随机选择的神经元。这可以防止神经元的共同适应，并迫使网络学习更强大的特征。
提前停止：当单独验证集上的性能开始下降时，即使训练集上的损失仍在减少，也停止训练过程。

The choice of architecture depends heavily on the structure of the financial data.

架构的选择在很大程度上取决于金融数据的结构。

Architecture	Data Type	Financial Application	Rationale
Feedforward Neural Networks (FNN)	Tabular data (cross-sectional features).	Credit scoring, bond rating prediction, factor selection.	Simple and effective for non-linear feature combinations.
Recurrent Neural Networks (RNN) / LSTM / GRU	Sequential data (time series).	High-frequency trading, volatility forecasting, long-term price prediction.	Designed to handle sequential dependencies and memory effects in time series.
Convolutional Neural Networks (CNN)	Image-like data (e.g., heatmaps of order book data, spectrograms of audio data).	Analyzing market microstructure patterns, processing satellite imagery for economic indicators.	Excellent at extracting local spatial features.
Autoencoders	High-dimensional data.	Dimensionality reduction, anomaly detection (e.g., identifying fraudulent transactions or market dislocations).	Learns a compressed representation of the input data.

建筑学	数据类型	金融应用	基本原理
前馈神经网络 (FNN)	表格数据（横截面特征）。	信用评分、债券评级预测、因子选择。	对于非线性特征组合来说简单而有效。
循环神经网络 (RNN) / LSTM / GRU	顺序数据（时间序列）。	高频交易、波动预测、长期价格预测。	旨在处理时间序列中的顺序依赖性和记忆效应。
卷积神经网络 (CNN)	类图像数据（例如，订单簿数据的热图、音频数据的频谱图）。	分析市场微观结构模式，处理卫星图像以获取经济指标。	擅长提取局部空间特征。
自动编码器	高维数据。	降维、异常检测（例如识别欺诈交易或市场混乱）。	学习输入数据的压缩表示。

补充讲解

模型不只是网络结构

Architecture is not the whole model

在量化场景中，数据对齐、目标构造、验证窗口和交易成本假设，往往比简单增加网络层数更关键。

For quant work, data alignment, target construction, validation windows, and transaction-cost assumptions often matter more than adding layers.

优化过程高度经验化

Optimization is empirical

学习率、归一化、批次构造和正则化决定梯度下降找到的是可用解，还是脆弱的样本内拟合。

Learning rate, normalization, batch construction, and regularization determine whether gradient descent finds a useful solution rather than a fragile fit.

跨市场状态验证

Validate across regimes

在一种波动率状态下有效的神经网络，可能在另一种状态下失效。滚动验证和市场状态诊断是必要步骤。

A neural model that works in one volatility regime may fail in another. Walk-forward validation and regime diagnostics are essential.

神经网络