支持向量机(Support Vector Machines,SVM)—有监督学习方法、非概率模型、判别模型、线性模型、非参数化模型、批量学习、核方法

发布于:2024-09-18 ⋅ 阅读:(57) ⋅ 点赞:(0)

定义

序列最小最优化(Sequential Minimal Optimization,SMO)

输入:训练数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯   , ( x N , y N ) } T=\{ (x_1,y_1),(x_2,y_2),\cdots,(x_N,y_N)\} T={(x1,y1),(x2,y2),,(xN,yN)},其中, x i ∈ χ = R n , y i ∈ y = { − 1 , + 1 } , i = 1 , 2 , ⋯   , N x_i \in \chi=R^n, y_i \in {\tt y}=\{ -1,+1 \},i=1,2,\cdots,N xiχ=Rn,yiy={1,+1},i=1,2,,N,精度 ε \varepsilon ε

输出:近似值 α ^ \hat \alpha α^

(1)取初值 α ( 0 ) = 0 \alpha^{(0)}=0 α(0)=0,令 k = 0 k=0 k=0

(2)选取优化变量 α 1 ( k ) , α 2 ( k ) \alpha_1^{(k)},\alpha_2^{(k)} α1(k),α2(k),解析求解两个变量的最优化问题,求得最优解 α 1 ( k + 1 ) , α 2 ( k + 2 ) \alpha_1^{(k+1)},\alpha_2^{(k+2)} α1(k+1),α2(k+2),更新 α \alpha α α ( k + 1 ) \alpha^{(k+1)} α(k+1)

(3)若在精度 ε \varepsilon ε范围内满足停机条件:

       ∑ i = 1 N α i y i = 0 , 0 ≤ α i ≤ C , i = 1 , 2 , ⋯   , N \sum_{i=1}^N\alpha_iy_i=0,0\leq\alpha_i\leq C,i=1,2,\cdots,N i=1Nαiyi=0,0αiC,i=1,2,,N

       y i ⋅ g ( x i ) : { ≥ 1 , { x i ∣ α i = 0 } = 1 , { x i ∣ 0 < α i < C } ≤ 1 , { x i ∣ α i = C   y_i \cdot g(x_i):\begin{cases} \geq 1,\{x_i|\alpha_i=0\}\\ = 1, \{x_i|0 < \alpha_i < C\} \\ \leq 1,\{x_i|\alpha_i=C\ \end{cases} yig(xi): 1,{xiαi=0}=1,{xi∣0<αi<C}1,{xiαi=C ,其中 g ( x i ) = ∑ j = 1 N α j y j K ( x j , x i ) + b , g(x_i)=\sum_{j=1}^N\alpha_jy_jK(x_j,x_i)+b, g(xi)=j=1NαjyjK(xj,xi)+b,

则转(4);否则令 k = k + 1 k=k+1 k=k+1,转(2);

(4)取 α ^ = α ( k + 1 ) \hat \alpha = \alpha^{(k+1)} α^=α(k+1)

输入空间

T= { ( x 1 , y 1 ) , ( x 2 , y 2 ) , … , ( x N , y N ) } \left\{(x_1,y_1),(x_2,y_2),\dots,(x_N,y_N)\right\} {(x1,y1),(x2,y2),,(xN,yN)}

import time
import numpy as np
import math
import random

def loadData(fileName):
    '''
    加载Mnist数据集 下载地址:https://download.csdn.net/download/nanxiaotao/89720991)
    :param fileName:要加载的文件路径
    :return: 数据集和标签集
    '''
    #存放数据及标记
    dataArr = []; labelArr = []
    #读取文件
    fr = open(fileName)
    #遍历文件中的每一行
    for line in fr.readlines():
        curLine = line.strip().split(',')
        dataArr.append([int(num) / 255 for num in curLine[1:]])
        if int(curLine[0]) == 0:
            labelArr.append(1)
        else:
            labelArr.append(-1)
    #返回数据集和标记
    return dataArr, labelArr
trainDataList, trainLabelList = loadData('../Mnist/mnist_train.csv')
trainDataMat = np.mat(trainDataList)       #训练数据集
trainLabelMat = np.mat(trainLabelList).T   #训练标签集,为了方便后续运算提前做了转置,变为列向量
np.shape(trainDataMat)

特征空间(Feature Space)

trainDataMat[0][0:784]

统计学习方法

模型

f ( x ) = s i g n ( ∑ i = 1 N α i y i K ( x j , x i ) + b ) f(x)=sign\bigg( \sum_{i=1}^N \alpha_iy_iK(x_j,x_i)+b \bigg) f(x)=sign(i=1NαiyiK(xj,xi)+b)

策略

目标函数:

m i n α 1 , α 2 \mathop{min}\limits_{\alpha_1,\alpha_2} α1,α2min   W ( α 1 , α 2 ) = 1 2 K 11 α 1 2 + 1 2 K 22 α 2 2 + y 1 y 2 K 12 α 1 α 2 − ( α 1 + α 2 ) + y 1 α 1 ∑ i = 3 N y i α i K i 1 + y 2 α 2 ∑ i = 3 N y i α i K i 2 W(\alpha_1,\alpha_2)=\frac{1}{2}K_{11}\alpha_1^2+\frac{1}{2}K_{22}\alpha_2^2+y_1y_2K_{12}\alpha_1\alpha_2-(\alpha_1+\alpha_2) +y_1\alpha_1\sum_{i=3}^Ny_i\alpha_iK_{i1}+y_2\alpha_2\sum_{i=3}^Ny_i\alpha_iK_{i2} W(α1,α2)=21K11α12+21K22α22+y1y2K12α1α2(α1+α2)+y1α1i=3NyiαiKi1+y2α2i=3NyiαiKi2其中,目标函数式中省略了不含 α 1 , α 2 \alpha_1,\alpha_2 α1,α2的常数项

s.t.       α 1 y 1 + α 2 y 2 = − ∑ i = 3 N y i α i = ζ \alpha_1y_1+\alpha_2y_2=-\sum_{i=3}^Ny_i\alpha_i=\zeta α1y1+α2y2=i=3Nyiαi=ζ,其中, 0 ≤ α i ≤ C , ζ 0\leq\alpha_i\leq C,\zeta 0αiC,ζ:常数

停机条件:

∑ i = 1 N α i y i = 0 , 0 ≤ α i ≤ C , i = 1 , 2 , ⋯   , N \sum_{i=1}^N\alpha_iy_i=0,0\leq\alpha_i\leq C,i=1,2,\cdots,N i=1Nαiyi=0,0αiC,i=1,2,,N

y i ⋅ g ( x i ) : { ≥ 1 , { x i ∣ α i = 0 } = 1 , { x i ∣ 0 < α i < C } ≤ 1 , { x i ∣ α i = C   y_i \cdot g(x_i):\begin{cases} \geq 1,\{x_i|\alpha_i=0\}\\ = 1, \{x_i|0 < \alpha_i < C\} \\ \leq 1,\{x_i|\alpha_i=C\ \end{cases} yig(xi): 1,{xiαi=0}=1,{xi∣0<αi<C}1,{xiαi=C ,其中 g ( x i ) = ∑ j = 1 N α j y j K ( x j , x i ) + b g(x_i)=\sum_{j=1}^N\alpha_jy_jK(x_j,x_i)+b g(xi)=j=1NαjyjK(xj,xi)+b

算法

m, n = np.shape(trainDataMat) #m:训练集数量    n:样本特征数目
sigma = 10 #高斯核分母中的σ
C = 200 #惩罚参数
toler = 0.001 #松弛变量
b = 0 #SVM中的偏置b
alpha = [0] * trainDataMat.shape[0] # α 长度为训练集数目
E = [0 * trainLabelMat[i, 0] for i in range(trainLabelMat.shape[0])] #SMO运算过程中的Ei
supportVecIndex = []

高斯核函数(Gaussian Kernel Function)

K ( x , z ) = e x p ( − ∣ ∣ x − z ∣ ∣ 2 2 σ 2 ) K(x,z)=exp\bigg( -\dfrac{\bigg|\bigg| x-z \bigg|\bigg|^2}{2\sigma^2} \bigg) K(x,z)=exp(2σ2 xz 2)

    def calcKernel():
        '''
        计算核函数
        :return: 高斯核矩阵
        '''
        k = [[0 for i in range(m)] for j in range(m)]
        for i in range(m):
            if i % 100 == 0:
                print('kernel:', i, m)
            X = trainDataMat[i, :]
            for j in range(i, m):
                #获得Z
                Z = trainDataMat[j, :]
                #先计算||X - Z||^2
                result = (X - Z) * (X - Z).T
                #分子除以分母后去指数,得到的即为高斯核结果
                result = np.exp(-1 * result / (2 * sigma**2))
                #将Xi*Xj的结果存放入k[i][j]和k[j][i]中
                k[i][j] = result
                k[j][i] = result
        #返回高斯核矩阵
        return k
k = calcKernel() #核函数(初始化时提前计算)

是否满足KKT条件

α i = 0    ⟺    y i g ( x i ) ⩾ 1 \alpha_i=0 \iff y_ig(x_i)\geqslant 1 αi=0yig(xi)1

0 < α i < C    ⟺    y i g ( x i ) = 1 0<\alpha_i<C \iff y_ig(x_i)=1 0<αi<Cyig(xi)=1

α i = C    ⟺    y i g ( x i ) ⩽ 1 \alpha_i = C \iff y_ig(x_i) \leqslant 1 αi=Cyig(xi)1

其中, g ( x i ) = ∑ j = 1 N α j y j K ( x i , x j ) + b g(x_i)=\sum_{j=1}^N\alpha_jy_jK(x_i,x_j)+b g(xi)=j=1NαjyjK(xi,xj)+b

    def calc_gxi(i):
        '''
        :param i:x的下标
        :return: g(xi)的值
        '''
        #初始化g(xi)
        gxi = 0
        index = [i for i, alpha in enumerate(alpha) if alpha != 0]
        #遍历每一个非零α,i为非零α的下标
        for j in index:
            #计算g(xi)
            gxi += alpha[j] * trainLabelMat[j] * k[j][i]
        #求和结束后再单独加上偏置b
        gxi += b

        #返回
        return gxi
    def isSatisfyKKT(i):
        '''
        查看第i个α是否满足KKT条件
        :param i:α的下标
        :return:
            True:满足
            False:不满足
        '''
        gxi =calc_gxi(i)
        yi = trainLabelMat[i]

        if (math.fabs(alpha[i]) < toler) and (yi * gxi >= 1):
            return True
        elif (math.fabs(alpha[i] - C) < toler) and (yi * gxi <= 1):
            return True
        elif (alpha[i] > -toler) and (alpha[i] < (C + toler)) \
                and (math.fabs(yi * gxi - 1) < toler):
            return True

        return False

E i = g ( x i ) = ( ∑ j = 1 N α j y j K ( x j , x i ) + b ) − y i , i = 1 , 2 E_i=g(x_i)=\bigg( \sum_{j=1}^N \alpha_jy_jK(x_j,x_i)+b \bigg)-y_i,i=1,2 Ei=g(xi)=(j=1NαjyjK(xj,xi)+b)yi,i=1,2

i = 1 , 2 i=1,2 i=1,2时, E i E_i Ei为函数 g ( x ) g(x) g(x)对输入 x i x_i xi的预测值与真实输出 y i y_i yi之差

    def calcEi(i):
        '''
        计算Ei
        :param i: E的下标
        :return:
        '''
        #计算g(xi)
        gxi = calc_gxi(i)
        #Ei = g(xi) - yi,直接将结果作为Ei返回
        return gxi - trainLabelMat[i]

α 2 ( n e w , u n c ) = α 2 o l d + y 2 ( E 1 − E 2 ) η \alpha_2^{(new,unc)}=\alpha_2^{old}+\dfrac{y_2(E_1-E_2)}{\eta} α2(new,unc)=α2old+ηy2(E1E2)

α 2 n e w : { H , α 2 ( n e w , u n c ) > H α 2 ( n e w , u n c ) , L ≤ α 2 ( n e w , u n c ) ≤ H L , α 2 ( n e w , u n c ) < L \alpha_2^{new}:\begin{cases} H,\alpha_2^{(new,unc)}>H\\ \alpha_2^{(new,unc)},L \leq\alpha_2^{(new,unc)}\leq H \\ L,\alpha_2^{(new,unc)}<L \end{cases} α2new: H,α2(new,unc)>Hα2(new,unc),Lα2(new,unc)HL,α2(new,unc)<L,其中 L = m a x ( 0 , α 2 o l d + α 1 o l d − C ) , H = m i n ( C , α 2 o l d + α 1 o l d ) L=max\big( 0,\alpha_2^{old} + \alpha_1^{old} - C \big),H=min\big( C, \alpha_2^{old} + \alpha_1^{old} \big) L=max(0,α2old+α1oldC),H=min(C,α2old+α1old)

 def getAlphaJ( E1, i):
        '''
        SMO中选择第二个变量
        :param E1: 第一个变量的E1
        :param i: 第一个变量α的下标
        :return: E2,α2的下标
        '''
        #初始化E2
        E2 = 0
        #初始化|E1-E2|为-1
        maxE1_E2 = -1
        #初始化第二个变量的下标
        maxIndex = -1
        #获得Ei非0的对应索引组成的列表,列表内容为非0Ei的下标i
        nozeroE = [i for i, Ei in enumerate(E) if Ei != 0]
        #对每个非零Ei的下标i进行遍历
        for j in nozeroE:
            #计算E2
            E2_tmp = calcEi(j)
            #如果|E1-E2|大于目前最大值
            if math.fabs(E1 - E2_tmp) > maxE1_E2:
                #更新最大值
                maxE1_E2 = math.fabs(E1 - E2_tmp)
                #更新最大值E2
                E2 = E2_tmp
                #更新最大值E2的索引j
                maxIndex = j
        #如果列表中没有非0元素了(对应程序最开始运行时的情况)
        if maxIndex == -1:
            maxIndex = i
            while maxIndex == i:
                #获得随机数,如果随机数与第一个变量的下标i一致则重新随机
                maxIndex = int(random.uniform(0, m))
            #获得E2
            E2 = calcEi(maxIndex)

        #返回第二个变量的E2值以及其索引
        return E2, maxIndex

模型训练

   def train(iter = 100):
        iterStep = 0; #iterStep:迭代次数,超过设置次数还未收敛则强制停止
        parameterChanged = 1 #parameterChanged:单次迭代中有参数改变则增加1
        global supportVecIndex
        global b
        global alpha

        while (iterStep < iter) and (parameterChanged > 0):
            #打印当前迭代轮数
            print('iter:%d:%d'%( iterStep, iter))
            #迭代步数加1
            iterStep += 1
            #新的一轮将参数改变标志位重新置0
            parameterChanged = 0

            #大循环遍历所有样本,用于找SMO中第一个变量
            for i in range(m):
                #查看第一个遍历是否满足KKT条件,如果不满足则作为SMO中第一个变量从而进行优化
                if isSatisfyKKT(i) == False:
                    #如果下标为i的α不满足KKT条件,则进行优化
                    E1 = calcEi(i)

                    #选择第2个变量
                    E2, j = getAlphaJ(E1, i)

                    #获得两个变量的标签
                    y1 = trainLabelMat[i]
                    y2 = trainLabelMat[j]
                    #复制α值作为old值
                    alphaOld_1 = alpha[i]
                    alphaOld_2 = alpha[j]
                    #依据标签是否一致来生成不同的L和H
                    if y1 != y2:
                        L = max(0, alphaOld_2 - alphaOld_1)
                        H = min(C, C + alphaOld_2 - alphaOld_1)
                    else:
                        L = max(0, alphaOld_2 + alphaOld_1 - C)
                        H = min(C, alphaOld_2 + alphaOld_1)
                    #如果两者相等,说明该变量无法再优化,直接跳到下一次循环
                    if L == H:   continue

                    #计算α的新值
                    k11 = k[i][i]
                    k22 = k[j][j]
                    k21 = k[j][i]
                    k12 = k[i][j]
                    #更新α2,该α2还未经剪切
                    alphaNew_2 = alphaOld_2 + y2 * (E1 - E2) / (k11 + k22 - 2 * k12)
                    #剪切α2
                    if alphaNew_2 < L: alphaNew_2 = L
                    elif alphaNew_2 > H: alphaNew_2 = H
                    #更新α1
                    alphaNew_1 = alphaOld_1 + y1 * y2 * (alphaOld_2 - alphaNew_2)

                    #计算b1和b2
                    b1New = -1 * E1 - y1 * k11 * (alphaNew_1 - alphaOld_1) \
                            - y2 * k21 * (alphaNew_2 - alphaOld_2) + b
                    b2New = -1 * E2 - y1 * k12 * (alphaNew_1 - alphaOld_1) \
                            - y2 * k22 * (alphaNew_2 - alphaOld_2) + b

                    #依据α1和α2的值范围确定新b
                    if (alphaNew_1 > 0) and (alphaNew_1 < C):
                        bNew = b1New
                    elif (alphaNew_2 > 0) and (alphaNew_2 < C):
                        bNew = b2New
                    else:
                        bNew = (b1New + b2New) / 2

                    #将更新后的各类值写入,进行更新
                    alpha[i] = alphaNew_1
                    alpha[j] = alphaNew_2
                    b = bNew

                    E[i] = calcEi(i)
                    E[j] = calcEi(j)

                    #如果α2的改变量过于小,就认为该参数未改变,不增加parameterChanged值
                    #反之则自增1
                    if math.fabs(alphaNew_2 - alphaOld_2) >= 0.00001:
                        parameterChanged += 1

                #打印迭代轮数,i值,该迭代轮数修改α数目
                print("iter: %d i:%d, pairs changed %d" % (iterStep, i, parameterChanged))

        #全部计算结束后,重新遍历一遍α,查找里面的支持向量
        for i in range(m):
            #如果α>0,说明是支持向量
            if alpha[i] > 0:
                #将支持向量的索引保存起来
                supportVecIndex.append(i)

train()

假设空间(Hypothesis Space)

{ f ∣ f ( x ) = s i g n ( ∑ i = 1 N α i y i K ( x j , x i ) + b ) } \left\{f|f(x) = sign\bigg( \sum_{i=1}^N \alpha_iy_iK(x_j,x_i)+b \bigg) \right\} {ff(x)=sign(i=1NαiyiK(xj,xi)+b)}

输出空间

Y ∈ { − 1 , 1 } Y \in \{ -1,1 \} Y{1,1}

模型评估

训练误差

testDataList, testLabelList = loadData('../Mnist/mnist_test.csv')
testDataList = testDataList[:100]
testLabelList = testLabelList[:100]
    def calcSinglKernel(x1, x2):
        '''
        单独计算核函数
        :param x1:向量1
        :param x2: 向量2
        :return: 核函数结果
        '''
        result = (x1 - x2) * (x1 - x2).T
        result = np.exp(-1 * result / (2 * sigma ** 2))
        #返回结果
        return np.exp(result)
    def predict(x):
        '''
        对样本的标签进行预测
        :param x: 要预测的样本x
        :return: 预测结果
        '''
        global supportVecIndex
        global alpha
        result = 0
        for i in supportVecIndex:
            tmp = calcSinglKernel(trainDataMat[i, :], np.mat(x))
            #对每一项子式进行求和,最终计算得到求和项的值
            result += alpha[i] * trainLabelMat[i] * tmp
        #求和项计算结束后加上偏置b
        result += b
        #使用sign函数返回预测结果
        return np.sign(result)
    def model_test(testDataList, testLabelList):
        '''
        测试
        :param testDataList:测试数据集
        :param testLabelList: 测试标签集
        :return: 正确率
        '''
        #错误计数值
        errorCnt = 0
        #遍历测试集所有样本
        for i in range(len(testDataList)):
            #打印目前进度
            print('test:%d:%d'%(i, len(testDataList)))
            #获取预测结果
            result = predict(testDataList[i])
            #如果预测与标签不一致,错误计数值加一
            if result != testLabelList[i]:
                errorCnt += 1
        #返回正确率
        return 1 - errorCnt / len(testDataList)
accuracy = model_test(testDataList, testLabelList)
print('accuracy:%d'%(accuracy * 100), '%')

测试误差

模型选择

正则化

过拟合

泛化能力