设 X = ( x i j ) m × n X = (x_{ij})_{m \times n} X=(xij)m×n,函数 f ( X ) = f ( x 11 , x 12 , … , x 1 n , x 21 , … , x m n ) f(X) = f(x_{11}, x_{12}, \ldots, x_{1n}, x_{21}, \ldots, x_{mn}) f(X)=f(x11,x12,…,x1n,x21,…,xmn) 是一个 m × n m \times n m×n 元的多元函数,且偏导数
∂ f ∂ x i j ( i = 1 , 2 , … , m , j = 1 , 2 , … , n ) \frac{\partial f}{\partial x_{ij}} \quad (i=1,2,\ldots,m,\ j=1,2,\ldots,n) ∂xij∂f(i=1,2,…,m, j=1,2,…,n)
都存在。定义 f ( X ) f(X) f(X) 对矩阵 X X X 的导数为:
d f ( X ) d X = ( ∂ f ∂ x i j ) m × n = [ ∂ f ∂ x 11 ⋯ ∂ f ∂ x 1 n ⋮ ⋱ ⋮ ∂ f ∂ x m 1 ⋯ ∂ f ∂ x m n ] \frac{df(X)}{dX} = \left( \frac{\partial f}{\partial x_{ij}} \right)_{m \times n} =\begin{bmatrix} \frac{\partial f}{\partial x_{11}} & \cdots & \frac{\partial f}{\partial x_{1n}} \\ \vdots & \ddots & \vdots \\ \frac{\partial f}{\partial x_{m1}} & \cdots & \frac{\partial f}{\partial x_{mn}} \end{bmatrix} dXdf(X)=(∂xij∂f)m×n= ∂x11∂f⋮∂xm1∂f⋯⋱⋯∂x1n∂f⋮∂xmn∂f
(1) 设 x = ( ξ 1 , ξ 2 , ⋯ , ξ n ) ⊤ \mathbf{x} = (\xi_1, \xi_2, \cdots, \xi_n)^\top x=(ξ1,ξ2,⋯,ξn)⊤, n n n 元函数 f ( x ) f(\mathbf{x}) f(x),求 d f d x ⊤ \frac{df}{d\mathbf{x}^\top} dx⊤df、 d f d x \frac{df}{d\mathbf{x}} dxdf 和 d 2 f d x 2 \frac{d^2f}{d\mathbf{x}^2} dx2d2f。
d f d x ⊤ = ( ∂ f ∂ ξ 1 , ∂ f ∂ ξ 2 , ⋯ , ∂ f ∂ ξ n ) \frac{df}{d\mathbf{x}^\top} = \begin{pmatrix} \frac{\partial f}{\partial \xi_1}, \frac{\partial f}{\partial \xi_2},\cdots, \frac{\partial f}{\partial \xi_n} \end{pmatrix} dx⊤df=(∂ξ1∂f,∂ξ2∂f,⋯,∂ξn∂f)
∇ f ( x ) = d f d x = ( ∂ f ∂ ξ 1 ∂ f ∂ ξ 2 ⋮ ∂ f ∂ ξ n ) ,这就是梯度。 \nabla f(\mathbf{x}) = \frac{df}{d\mathbf{x}} = \begin{pmatrix} \frac{\partial f}{\partial \xi_1} \\ \frac{\partial f}{\partial \xi_2} \\ \vdots \\ \frac{\partial f}{\partial \xi_n} \end{pmatrix} \text{,这就是梯度。} ∇f(x)=dxdf= ∂ξ1∂f∂ξ2∂f⋮∂ξn∂f ,这就是梯度。
H ( x ) = ∇ 2 f ( x ) = ∂ 2 f ∂ x ∂ x ⊤ = [ ∂ 2 f ∂ ξ 1 2 ∂ 2 f ∂ ξ 1 ∂ ξ 2 ⋯ ∂ 2 f ∂ ξ 1 ∂ ξ n ∂ 2 f ∂ ξ 2 ∂ ξ 1 ∂ 2 f ∂ ξ 2 2 ⋯ ∂ 2 f ∂ ξ 2 ∂ ξ n ⋮ ⋮ ⋱ ⋮ ∂ 2 f ∂ ξ n ∂ ξ 1 ∂ 2 f ∂ ξ n ∂ ξ 2 ⋯ ∂ 2 f ∂ ξ n 2 ] , 这就是Hessian 矩阵,它是对称的。 H(\mathbf{x}) = \nabla^2 f(\mathbf{x}) = \frac{\partial^2 f}{\partial \mathbf{x} \partial \mathbf{x}^\top} = \begin{bmatrix} \frac{\partial^2 f}{\partial \xi_1^2} & \frac{\partial^2 f}{\partial \xi_1 \partial \xi_2} & \cdots & \frac{\partial^2 f}{\partial \xi_1 \partial \xi_n} \\ \frac{\partial^2 f}{\partial \xi_2 \partial \xi_1} & \frac{\partial^2 f}{\partial \xi_2^2} & \cdots & \frac{\partial^2 f}{\par