支持向量机的原理和案例解析
一、支持向量机的核心目标:间隔最大化
支持向量机的核心思想是在两类数据中找到一个最优分离超平面,使得两类数据到超平面的“间隔”最大。我们从线性可分情况开始推导(非线性情况可通过核函数扩展)。
步骤1:定义分离超平面
在n维空间中,线性分离超平面的方程为:
w ⋅ x + b = 0 (1) \boldsymbol{w} \cdot \boldsymbol{x} + b = 0 \tag{1} w⋅x+b=0(1)
- w = ( w 1 , w 2 , . . . , w n ) T \boldsymbol{w} = (w_1, w_2, ..., w_n)^T w=(w1,w2,...,wn)T 是超平面的法向量(决定超平面方向);
- b b b 是偏置项(决定超平面位置);
- x = ( x 1 , x 2 , . . . , x n ) T \boldsymbol{x} = (x_1, x_2, ..., x_n)^T x=(x1,x2,...,xn)T 是样本点的特征向量。
对于二分类问题,假设样本标签为 y ∈ { + 1 , − 1 } y \in \{+1, -1\} y∈{+1,−1}(分别代表正、负类),则超平面需满足:
- 正类样本: w ⋅ x + b > 0 \boldsymbol{w} \cdot \boldsymbol{x} + b > 0 w⋅x+b>0(即 y = + 1 y=+1 y=+1);
- 负类样本: w ⋅ x + b < 0 \boldsymbol{w} \cdot \boldsymbol{x} + b < 0 w⋅x+b<0(即 y = − 1 y=-1 y=−1)。
步骤2:定义样本到超平面的距离(间隔)
为衡量超平面的“分离能力”,需定义样本到超平面的距离。
函数间隔:对于样本 ( x i , y i ) (\boldsymbol{x}_i, y_i) (xi,yi),函数间隔为 y i ( w ⋅ x i + b ) y_i(\boldsymbol{w} \cdot \boldsymbol{x}_i + b) yi(w⋅xi+b)。
- 意义:若值为正,说明分类正确( y i y_i yi 与 w ⋅ x i + b \boldsymbol{w} \cdot \boldsymbol{x}_i + b w⋅xi+b 同号);绝对值越大,分类可信度越高。
几何间隔:函数间隔受 w \boldsymbol{w} w 和 b b b 缩放影响(例如 w → 2 w , b → 2 b \boldsymbol{w} \to 2\boldsymbol{w}, b \to 2b w→2w,b→2b 时超平面不变,但函数间隔翻倍),因此需归一化:
γ i = y i ( w ⋅ x i + b ) ∥ w ∥ (2) \gamma_i = \frac{y_i(\boldsymbol{w} \cdot \boldsymbol{x}_i + b)}{\|\boldsymbol{w}\|} \tag{2} γi=∥w∥yi(w⋅xi+b)(2)
其中 ∥ w ∥ = w ⋅ w \|\boldsymbol{w}\| = \sqrt{\boldsymbol{w} \cdot \boldsymbol{w}} ∥w∥=w⋅w 是 w \boldsymbol{w} w 的L2范数,几何间隔即样本到超平面的实际欧氏距离。
步骤3:间隔最大化的目标
最优超平面需满足:所有样本的几何间隔中最小的那个(即“最小间隔”)尽可能大。
设数据集的最小几何间隔为 γ = min i γ i \gamma = \min_i \gamma_i γ=miniγi,目标是最大化 γ \gamma γ:
max w , b γ s.t. y i ( w ⋅ x i + b ) ∥ w ∥ ≥ γ ( ∀ i ) (3) \max_{\boldsymbol{w}, b} \gamma \quad \text{s.t.} \quad \frac{y_i(\boldsymbol{w} \cdot \boldsymbol{x}_i + b)}{\|\boldsymbol{w}\|} \geq \gamma \quad (\forall i) \tag{3} w,bmaxγs.t.∥w∥yi(w⋅xi+b)≥γ(∀i)(3)
步骤4:简化目标函数
由于超平面 ( w , b ) (\boldsymbol{w}, b) (w,b) 与 ( k w , k b ) (k\boldsymbol{w}, kb) (kw,kb)( k > 0 k>0 k>0)表示同一平面,可通过缩放将最小函数间隔归一化为1(即 y i ( w ⋅ x i + b ) ≥ 1 y_i(\boldsymbol{w} \cdot \boldsymbol{x}_i + b) \geq 1 yi(w⋅xi+b)≥1),此时最小几何间隔 γ = 1 ∥ w ∥ \gamma = \frac{1}{\|\boldsymbol{w}\|} γ=∥w∥1。
因此,最大化 γ \gamma γ 等价于最小化 ∥ w ∥ \|\boldsymbol{w}\| ∥w∥(或 1 2 ∥ w ∥ 2 \frac{1}{2}\|\boldsymbol{w}\|^2 21∥w∥2,便于求导),目标函数简化为:
min w , b 1 2 ∥ w ∥ 2 s.t. y i ( w ⋅ x i + b ) ≥ 1 ( ∀ i ) (4) \min_{\boldsymbol{w}, b} \frac{1}{2}\|\boldsymbol{w}\|^2 \quad \text{s.t.} \quad y_i(\boldsymbol{w} \cdot \boldsymbol{x}_i + b) \geq 1 \quad (\forall i) \tag{4} w,bmin21∥w∥2s.t.yi(w⋅xi+b)≥1(∀i)(4)
二、通过拉格朗日乘子法求解优化问题
式(4)是带不等式约束的凸优化问题,可通过拉格朗日乘子法转化为对偶问题求解。
步骤5:构建拉格朗日函数
引入拉格朗日乘子 α i ≥ 0 \alpha_i \geq 0 αi≥0(对应每个约束条件),拉格朗日函数为:
L ( w , b , α ) = 1 2 ∥ w ∥ 2 − ∑ i = 1 N α i [ y i ( w ⋅ x i + b ) − 1 ] ( 5 ) \mathcal{L}(\boldsymbol{w}, b, \boldsymbol{\alpha}) = \frac{1}{2}\|\boldsymbol{w}\|^2 - \sum_{i=1}^N \alpha_i \left[ y_i(\boldsymbol{w} \cdot \boldsymbol{x}_i + b) - 1 \right] \ (5) L(w,b,α)=21∥w∥2−i=1∑Nαi[yi(w⋅xi+b)−1] (5)
- 第一项是原目标函数;
- 第二项是约束条件的惩罚项( α i ≥ 0 \alpha_i \geq 0 αi≥0 确保约束被满足)。
步骤6:求解对偶问题(KKT条件)
凸优化的对偶问题需满足KKT(Karush-Kuhn-Tucker)条件,即:
- 原始可行性: y i ( w ⋅ x i + b ) ≥ 1 y_i(\boldsymbol{w} \cdot \boldsymbol{x}_i + b) \geq 1 yi(w⋅xi+b)≥1;
- 对偶可行性: α i ≥ 0 \alpha_i \geq 0 αi≥0;
- 互补松弛: α i [ y i ( w ⋅ x i + b ) − 1 ] = 0 \alpha_i \left[ y_i(\boldsymbol{w} \cdot \boldsymbol{x}_i + b) - 1 \right] = 0 αi[yi(w⋅xi+b)−1]=0(若 α i > 0 \alpha_i > 0 αi>0,则约束取等号,即 y i ( w ⋅ x i + b ) = 1 y_i(\boldsymbol{w} \cdot \boldsymbol{x}_i + b) = 1 yi(w⋅xi+b)=1);
- 梯度为零: ∇ w L = 0 \nabla_{\boldsymbol{w}} \mathcal{L} = 0 ∇wL=0, ∇ b L = 0 \nabla_b \mathcal{L} = 0 ∇bL=0。
步骤7:求导化简(核心推导)
对 w \boldsymbol{w} w 和 b b b 求偏导并令其为0:
对 w \boldsymbol{w} w 求导:
∇ w L = w − ∑ i = 1 N α i y i x i = 0 ⟹ w = ∑ i = 1 N α i y i x i ( 6 ) \nabla_{\boldsymbol{w}} \mathcal{L} = \boldsymbol{w} - \sum_{i=1}^N \alpha_i y_i \boldsymbol{x}_i = 0 \implies \boldsymbol{w} = \sum_{i=1}^N \alpha_i y_i \boldsymbol{x}_i \ (6) ∇wL=w−i=1∑Nαiyixi=0⟹w=i=1∑Nαiyixi (6)
( w \boldsymbol{w} w 可由样本的线性组合表示,系数为 α i y i \alpha_i y_i αiyi)对 b b b 求导:
∇ b L = − ∑ i = 1 N α i y i = 0 ⟹ ∑ i = 1 N α i y i = 0 ( 7 ) \nabla_b \mathcal{L} = -\sum_{i=1}^N \alpha_i y_i = 0 \implies \sum_{i=1}^N \alpha_i y_i = 0 \ (7) ∇bL=−i=1∑Nαiyi=0⟹i=1∑Nαiyi=0 (7)
步骤8:对偶问题的目标函数
将式(6)代入拉格朗日函数(5),化简对偶问题:
展开 1 2 ∥ w ∥ 2 \frac{1}{2}\|\boldsymbol{w}\|^2 21∥w∥2:
1 2 ∥ w ∥ 2 = 1 2 ( ∑ i = 1 N α i y i x i ) ⋅ ( ∑ j = 1 N α j y j x j ) = 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) \frac{1}{2}\|\boldsymbol{w}\|^2 = \frac{1}{2} \left( \sum_{i=1}^N \alpha_i y_i \boldsymbol{x}_i \right) \cdot \left( \sum_{j=1}^N \alpha_j y_j \boldsymbol{x}_j \right) = \frac{1}{2} \sum_{i=1}^N \sum_{j=1}^N \alpha_i \alpha_j y_i y_j (\boldsymbol{x}_i \cdot \boldsymbol{x}_j) 21∥w∥2=21(i=1∑Nαiyixi)⋅(j=1∑Nαjyjxj)=21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)展开惩罚项:
∑ i = 1 N α i [ y i ( w ⋅ x i + b ) − 1 ] = ∑ i = 1 N α i y i ( w ⋅ x i ) + ∑ i = 1 N α i y i b − ∑ i = 1 N α i \sum_{i=1}^N \alpha_i \left[ y_i(\boldsymbol{w} \cdot \boldsymbol{x}_i + b) - 1 \right] = \sum_{i=1}^N \alpha_i y_i (\boldsymbol{w} \cdot \boldsymbol{x}_i) + \sum_{i=1}^N \alpha_i y_i b - \sum_{i=1}^N \alpha_i i=1∑Nαi[yi(w⋅xi+b)−1]=i=1∑Nαiyi(w⋅xi)+i=1∑Nαiyib−i=1∑Nαi
由式(6)和(7),第二项 ∑ α i y i b = b ⋅ 0 = 0 \sum \alpha_i y_i b = b \cdot 0 = 0 ∑αiyib=b⋅0=0,第一项:
∑ i = 1 N α i y i ( w ⋅ x i ) = ∑ i = 1 N α i y i ( ∑ j = 1 N α j y j x j ⋅ x i ) = ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) \sum_{i=1}^N \alpha_i y_i (\boldsymbol{w} \cdot \boldsymbol{x}_i) = \sum_{i=1}^N \alpha_i y_i \left( \sum_{j=1}^N \alpha_j y_j \boldsymbol{x}_j \cdot \boldsymbol{x}_i \right) = \sum_{i=1}^N \sum_{j=1}^N \alpha_i \alpha_j y_i y_j (\boldsymbol{x}_i \cdot \boldsymbol{x}_j) i=1∑Nαiyi(w⋅xi)=i=1∑Nαiyi(j=1∑Nαjyjxj⋅xi)=i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)合并化简拉格朗日函数:
L = 1 2 ∑ i , j α i α j y i y j ( x i ⋅ x j ) − [ ∑ i , j α i α j y i y j ( x i ⋅ x j ) − ∑ i α i ] \mathcal{L} = \frac{1}{2} \sum_{i,j} \alpha_i \alpha_j y_i y_j (\boldsymbol{x}_i \cdot \boldsymbol{x}_j) - \left[ \sum_{i,j} \alpha_i \alpha_j y_i y_j (\boldsymbol{x}_i \cdot \boldsymbol{x}_j) - \sum_i \alpha_i \right] L=21i,j∑αiαjyiyj(xi⋅xj)−[i,j∑αiαjyiyj(xi⋅xj)−i∑αi]
= − 1 2 ∑ i , j α i α j y i y j ( x i ⋅ x j ) + ∑ i α i = -\frac{1}{2} \sum_{i,j} \alpha_i \alpha_j y_i y_j (\boldsymbol{x}_i \cdot \boldsymbol{x}_j) + \sum_i \alpha_i =−21i,j∑αiαjyiyj(xi⋅xj)+i∑αi
因此,对偶问题转化为最大化以下函数( subject to 约束 α i ≥ 0 \alpha_i \geq 0 αi≥0 和 ∑ α i y i = 0 \sum \alpha_i y_i = 0 ∑αiyi=0):
max α ( ∑ i = 1 N α i − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) ) (8) \max_{\boldsymbol{\alpha}} \left( \sum_{i=1}^N \alpha_i - \frac{1}{2} \sum_{i=1}^N \sum_{j=1}^N \alpha_i \alpha_j y_i y_j (\boldsymbol{x}_i \cdot \boldsymbol{x}_j) \right) \tag{8} αmax(i=1∑Nαi−21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj))(8)
步骤9:求解超平面参数
通过对偶问题求出 α \boldsymbol{\alpha} α 后,可计算:
- w \boldsymbol{w} w:由式(6), w = ∑ α i y i x i \boldsymbol{w} = \sum \alpha_i y_i \boldsymbol{x}_i w=∑αiyixi;
- b b b:由互补松弛条件,对 α i > 0 \alpha_i > 0 αi>0 的样本(即支持向量), y i ( w ⋅ x i + b ) = 1 y_i(\boldsymbol{w} \cdot \boldsymbol{x}_i + b) = 1 yi(w⋅xi+b)=1,解得:
b = y i − w ⋅ x i = y i − ∑ j = 1 N α j y j ( x j ⋅ x i ) (9) b = y_i - \boldsymbol{w} \cdot \boldsymbol{x}_i = y_i - \sum_{j=1}^N \alpha_j y_j (\boldsymbol{x}_j \cdot \boldsymbol{x}_i) \tag{9} b=yi−w⋅xi=yi−j=1∑Nαjyj(xj⋅xi)(9)
步骤10:分类决策函数
对新样本 x \boldsymbol{x} x,分类结果由超平面的符号决定:
f ( x ) = sign ( w ⋅ x + b ) = sign ( ∑ i = 1 N α i y i ( x i ⋅ x ) + b ) ( 10 ) f(\boldsymbol{x}) = \text{sign}(\boldsymbol{w} \cdot \boldsymbol{x} + b) = \text{sign}\left( \sum_{i=1}^N \alpha_i y_i (\boldsymbol{x}_i \cdot \boldsymbol{x}) + b \right) \ (10) f(x)=sign(w⋅x+b)=sign(i=1∑Nαiyi(xi⋅x)+b) (10)
三、数学案例:线性可分数据的SVM求解
设二维数据集如下(线性可分):
- 正类( y = + 1 y=+1 y=+1): x 1 = ( 3 , 3 ) T \boldsymbol{x}_1 = (3, 3)^T x1=(3,3)T, x 2 = ( 4 , 3 ) T \boldsymbol{x}_2 = (4, 3)^T x2=(4,3)T;
- 负类( y = − 1 y=-1 y=−1): x 3 = ( 1 , 1 ) T \boldsymbol{x}_3 = (1, 1)^T x3=(1,1)T, x 4 = ( 2 , 1 ) T \boldsymbol{x}_4 = (2, 1)^T x4=(2,1)T。
步骤1:直观分析
最优超平面应位于两类数据中间,假设支持向量为 x 1 \boldsymbol{x}_1 x1(正类)和 x 3 \boldsymbol{x}_3 x3(负类),即 α 1 > 0 , α 3 > 0 \alpha_1 > 0, \alpha_3 > 0 α1>0,α3>0, α 2 = α 4 = 0 \alpha_2 = \alpha_4 = 0 α2=α4=0(非支持向量)。
步骤2:代入对偶问题约束
由 ∑ α i y i = 0 \sum \alpha_i y_i = 0 ∑αiyi=0:
α 1 ⋅ 1 + α 3 ⋅ ( − 1 ) = 0 ⟹ α 1 = α 3 (11) \alpha_1 \cdot 1 + \alpha_3 \cdot (-1) = 0 \implies \alpha_1 = \alpha_3 \tag{11} α1⋅1+α3⋅(−1)=0⟹α1=α3(11)
步骤3:计算对偶目标函数
目标函数式(8)简化为(仅保留 α 1 , α 3 \alpha_1, \alpha_3 α1,α3):
W ( α ) = α 1 + α 3 − 1 2 [ α 1 2 ( x 1 ⋅ x 1 ) + α 3 2 ( x 3 ⋅ x 3 ) + 2 α 1 α 3 y 1 y 3 ( x 1 ⋅ x 3 ) ] W(\alpha) = \alpha_1 + \alpha_3 - \frac{1}{2} \left[ \alpha_1^2 (x_1 \cdot x_1) + \alpha_3^2 (x_3 \cdot x_3) + 2\alpha_1 \alpha_3 y_1 y_3 (x_1 \cdot x_3) \right] W(α)=α1+α3−21[α12(x1⋅x1)+α32(x3⋅x3)+2α1α3y1y3(x1⋅x3)]
代入数据:
- x 1 ⋅ x 1 = 3 2 + 3 2 = 18 x_1 \cdot x_1 = 3^2 + 3^2 = 18 x1⋅x1=32+32=18, x 3 ⋅ x 3 = 1 2 + 1 2 = 2 x_3 \cdot x_3 = 1^2 + 1^2 = 2 x3⋅x3=12+12=2;
- x 1 ⋅ x 3 = 3 ⋅ 1 + 3 ⋅ 1 = 6 x_1 \cdot x_3 = 3 \cdot 1 + 3 \cdot 1 = 6 x1⋅x3=3⋅1+3⋅1=6, y 1 y 3 = 1 ⋅ ( − 1 ) = − 1 y_1 y_3 = 1 \cdot (-1) = -1 y1y3=1⋅(−1)=−1;
- 由 α 1 = α 3 = α \alpha_1 = \alpha_3 = \alpha α1=α3=α,得:
W ( α ) = 2 α − 1 2 [ α 2 ⋅ 18 + α 2 ⋅ 2 + 2 α 2 ⋅ ( − 1 ) ⋅ 6 ] W(\alpha) = 2\alpha - \frac{1}{2} \left[ \alpha^2 \cdot 18 + \alpha^2 \cdot 2 + 2\alpha^2 \cdot (-1) \cdot 6 \right] W(α)=2α−21[α2⋅18+α2⋅2+2α2⋅(−1)⋅6]
= 2 α − 1 2 [ 20 α 2 − 12 α 2 ] = 2 α − 4 α 2 = 2\alpha - \frac{1}{2} \left[ 20\alpha^2 - 12\alpha^2 \right] = 2\alpha - 4\alpha^2 =2α−21[20α2−12α2]=2α−4α2
步骤4:最大化对偶函数
对 W ( α ) W(\alpha) W(α) 求导并令其为0:
d W d α = 2 − 8 α = 0 ⟹ α = 1 4 \frac{dW}{d\alpha} = 2 - 8\alpha = 0 \implies \alpha = \frac{1}{4} dαdW=2−8α=0⟹α=41
因此, α 1 = α 3 = 1 4 \alpha_1 = \alpha_3 = \frac{1}{4} α1=α3=41, α 2 = α 4 = 0 \alpha_2 = \alpha_4 = 0 α2=α4=0。
步骤5:计算 w \boldsymbol{w} w 和 b b b
- w = α 1 y 1 x 1 + α 3 y 3 x 3 = 1 4 ⋅ 1 ⋅ ( 3 , 3 ) + 1 4 ⋅ ( − 1 ) ⋅ ( 1 , 1 ) = ( 3 − 1 4 , 3 − 1 4 ) = ( 0.5 , 0.5 ) \boldsymbol{w} = \alpha_1 y_1 \boldsymbol{x}_1 + \alpha_3 y_3 \boldsymbol{x}_3 = \frac{1}{4} \cdot 1 \cdot (3,3) + \frac{1}{4} \cdot (-1) \cdot (1,1) = \left( \frac{3-1}{4}, \frac{3-1}{4} \right) = (0.5, 0.5) w=α1y1x1+α3y3x3=41⋅1⋅(3,3)+41⋅(−1)⋅(1,1)=(43−1,43−1)=(0.5,0.5);
- 由支持向量 x 1 \boldsymbol{x}_1 x1 求 b b b: y 1 ( w ⋅ x 1 + b ) = 1 y_1(\boldsymbol{w} \cdot \boldsymbol{x}_1 + b) = 1 y1(w⋅x1+b)=1
1 ⋅ ( 0.5 ⋅ 3 + 0.5 ⋅ 3 + b ) = 1 ⟹ 3 + b = 1 ⟹ b = − 2 1 \cdot (0.5 \cdot 3 + 0.5 \cdot 3 + b) = 1 \implies 3 + b = 1 \implies b = -2 1⋅(0.5⋅3+0.5⋅3+b)=1⟹3+b=1⟹b=−2
步骤6:验证超平面
超平面方程: 0.5 x 1 + 0.5 x 2 − 2 = 0 0.5x_1 + 0.5x_2 - 2 = 0 0.5x1+0.5x2−2=0(即 x 1 + x 2 = 4 x_1 + x_2 = 4 x1+x2=4)。
- 正类样本 x 1 \boldsymbol{x}_1 x1 到超平面的距离: ∣ 3 + 3 − 4 ∣ 0.5 2 + 0.5 2 = 2 \frac{|3+3-4|}{\sqrt{0.5^2 + 0.5^2}} = \sqrt{2} 0.52+0.52∣3+3−4∣=2;
- 负类样本 x 3 \boldsymbol{x}_3 x3 到超平面的距离: ∣ 1 + 1 − 4 ∣ 0.5 2 + 0.5 2 = 2 \frac{|1+1-4|}{\sqrt{0.5^2 + 0.5^2}} = \sqrt{2} 0.52+0.52∣1+1−4∣=2,满足间隔最大化。
总结
SVM通过间隔最大化确定最优超平面,利用拉格朗日乘子法将原始问题转化为对偶问题,最终仅通过支持向量( α i > 0 \alpha_i > 0 αi>0 的样本)即可求解超平面参数。这一特性使其在高维空间中仍具高效性,且可通过核函数扩展到非线性分类场景。以下将对支持向量机(SVM)的核心公式原理进行逐步骤详细推导,并结合数学案例说明,确保每一步的逻辑和计算过程清晰可追溯。