机器学习中矩阵求导公式

发布于:2025-03-19 ⋅ 阅读:(15) ⋅ 点赞:(0)

A.2 导数

向量 a \mathbf{a} a 相对于标量 x x x 的导数(derivative),以及 x x x 相对于 a \mathbf{a} a 的导数都是向量,其第 i i i 个分量分别为

( ∂ a ∂ x ) i = ∂ a i ∂ x , (A.16) \left( \frac{\partial \mathbf{a}}{\partial x} \right)_i = \frac{\partial a_i}{\partial x}, \tag{A.16} (xa)i=xai,(A.16)

( ∂ x ∂ a ) i = ∂ x ∂ a i . (A.17) \left( \frac{\partial x}{\partial \mathbf{a}} \right)_i = \frac{\partial x}{\partial a_i}. \tag{A.17} (ax)i=aix.(A.17)

类似的,矩阵 A \mathbf{A} A 对于标量 x x x 的导数,以及 x x x 对于 A \mathbf{A} A 的导数都是矩阵,其第 i$ 行第 j j j 列上的元素分别为

( ∂ A ∂ x ) i j = ∂ A i j ∂ x , (A.18) \left( \frac{\partial \mathbf{A}}{\partial x} \right)_{ij} = \frac{\partial A_{ij}}{\partial x}, \tag{A.18} (xA)ij=xAij,(A.18)

( ∂ x ∂ A ) i j = ∂ x ∂ A i j . (A.19) \left( \frac{\partial x}{\partial \mathbf{A}} \right)_{ij} = \frac{\partial x}{\partial A_{ij}}. \tag{A.19} (Ax)ij=Aijx.(A.19)

对于函数 f ( x ) f(\mathbf{x}) f(x) ,假定其对向量的元素可导,则 f ( x ) f(\mathbf{x}) f(x) 关于 x \mathbf{x} x 的一阶导数是一个向量,其第 i i i 个分量为

( ∇ f ( x ) ) i = ∂ f ( x ) ∂ x i , (A.20) \left( \nabla f(\mathbf{x}) \right)_i = \frac{\partial f(\mathbf{x})}{\partial x_i}, \tag{A.20} (f(x))i=xif(x),(A.20)

f ( x ) f(\mathbf{x}) f(x) 关于 x \mathbf{x} x 的二阶导数是称为海森矩阵(Hessian matrix)的一个方阵,其第 i i i 行第 j j j 列上的元素为

( ∇ 2 f ( x ) ) i j = ∂ 2 f ( x ) ∂ x i ∂ x j . (A.21) \left( \nabla^2 f(\mathbf{x}) \right)_{ij} = \frac{\partial^2 f(\mathbf{x})}{\partial x_i \partial x_j}. \tag{A.21} (2f(x))ij=xixj2f(x).(A.21)

向量和矩阵的导数满足乘法法则(product rule)

∂ a T x ∂ x = ∂ a T x ∂ x = a , (A.22) \frac{\partial \mathbf{a}^T \mathbf{x}}{\partial \mathbf{x}} = \frac{\partial \mathbf{a}^T \mathbf{x}}{\partial \mathbf{x}} = \mathbf{a}, \tag{A.22} xaTx=xaTx=a,(A.22)

∂ A B ∂ x = ∂ A ∂ x B + A ∂ B ∂ x . (A.23) \frac{\partial \mathbf{A} \mathbf{B}}{\partial \mathbf{x}} = \frac{\partial \mathbf{A}}{\partial \mathbf{x}} \mathbf{B} + \mathbf{A} \frac{\partial \mathbf{B}}{\partial \mathbf{x}}. \tag{A.23} xAB=xAB+AxB.(A.23)

A − 1 A = I \mathbf{A}^{-1} \mathbf{A} = \mathbf{I} A1A=I 和式(A.23),逆矩阵的导数可表示为

∂ A − 1 ∂ x = − A − 1 ∂ A ∂ x A − 1 . (A.24) \frac{\partial \mathbf{A}^{-1}}{\partial \mathbf{x}} = -\mathbf{A}^{-1} \frac{\partial \mathbf{A}}{\partial \mathbf{x}} \mathbf{A}^{-1}. \tag{A.24} xA1=A1xAA1.(A.24)

若求导的标量是矩阵 $ \mathbf{A} $ 的元素,则有

∂ tr ( A B ) ∂ A i j = B j i , (A.25) \frac{\partial \text{tr}(\mathbf{AB})}{\partial A_{ij}} = B_{ji}, \tag{A.25} Aijtr(AB)=Bji,(A.25)

∂ tr ( A B ) ∂ A = B T . (A.26) \frac{\partial \text{tr}(\mathbf{AB})}{\partial \mathbf{A}} = \mathbf{B}^T. \tag{A.26} Atr(AB)=BT.(A.26)

进而有

∂ tr ( A T B ) ∂ A = B , (A.27) \frac{\partial \text{tr}(\mathbf{A}^T \mathbf{B})}{\partial \mathbf{A}} = \mathbf{B}, \tag{A.27} Atr(ATB)=B,(A.27)

∂ tr ( A ) ∂ A = I , (A.28) \frac{\partial \text{tr}(\mathbf{A})}{\partial \mathbf{A}} = \mathbf{I}, \tag{A.28} Atr(A)=I,(A.28)

∂ tr ( A B A T ) ∂ A = A ( B + B T ) . (A.29) \frac{\partial \text{tr}(\mathbf{ABA}^T)}{\partial \mathbf{A}} = \mathbf{A}(\mathbf{B} + \mathbf{B}^T). \tag{A.29} Atr(ABAT)=A(B+BT).(A.29)

由式(A.15)和(A.29)有

∂ ∥ A ∥ F 2 ∂ A = ∂ tr ( A A T ) ∂ A = 2 A . (A.30) \frac{\partial \|\mathbf{A}\|_F^2}{\partial \mathbf{A}} = \frac{\partial \text{tr}(\mathbf{A}\mathbf{A}^T)}{\partial \mathbf{A}} = 2\mathbf{A}. \tag{A.30} AAF2=Atr(AAT)=2A.(A.30)

链式法则(chain rule)是计算复杂导数时的重要工具。简单地说,若函数 $ f $ 是 $ g $ 和 $ h $ 的复合,即 $ f(x) = g(h(x)) $ ,则有

∂ f ( x ) ∂ x = ∂ g ( h ( x ) ) ∂ h ( x ) ⋅ ∂ h ( x ) ∂ x . (A.31) \frac{\partial f(x)}{\partial x} = \frac{\partial g(h(x))}{\partial h(x)} \cdot \frac{\partial h(x)}{\partial x}. \tag{A.31} xf(x)=h(x)g(h(x))xh(x).(A.31)

例如在计算下式时,将 A x − b \mathbf{A}\mathbf{x} - \mathbf{b} Axb 看作一个整体可简化计算:

∂ ∂ x ( A x − b ) T W ( A x − b ) = ∂ ( A x − b ) ∂ x ⋅ 2 W ( A x − b ) \frac{\partial}{\partial \mathbf{x}} (\mathbf{A}\mathbf{x} - \mathbf{b})^T \mathbf{W} (\mathbf{A}\mathbf{x} - \mathbf{b}) = \frac{\partial (\mathbf{A}\mathbf{x} - \mathbf{b})}{\partial \mathbf{x}} \cdot 2\mathbf{W} (\mathbf{A}\mathbf{x} - \mathbf{b}) x(Axb)TW(Axb)=x(Axb)2W(Axb)

= 2 A T W ( A x − b ) . (A.32) = 2\mathbf{A}^T \mathbf{W} (\mathbf{A}\mathbf{x} - \mathbf{b}). \tag{A.32} =2ATW(Axb).(A.32)

机器学习中 W \mathbf{W} W通常是对称矩阵。


网站公告

今日签到

点亮在社区的每一天
去签到