前言
线性最小方差估计(Linear Minimum Variance Estimation,LMVE) 是一种特殊的最小方差估计(MMSE)。作为卡尔曼滤波(Kalman Filtering) 的基础,在最优估计理论中的地位占有非常重要。本文将详细介绍该最优估计准则与方法。
最优估计问题(Optimal Estimation Problem)
实际工程中大多是多维随机向量的估计问题,如下:
设 X X X为 n n n维随机向量, Z Z Z为观测 X X X的 m m m维随机向量, X ^ \hat{X} X^为对 X X X的估计量,是关于 Z Z Z的函数, X ~ \tilde{X} X~为 X ^ \hat{X} X^的估计误差,实现对 X X X的最优估计。令 A A A为 n × m n×m n×m维矩阵, B B B为 n n n维随机向量, X ^ \hat{X} X^与 Z Z Z满足线性关系,根据最优估计准则对估计量 X ^ \hat{X} X^进行求极值。
X ^ = A Z + B X ~ = X − X ^ = X − ( A Z + B ) \begin{align*} \hat{X}&=AZ+B \tag{1}\\ \tilde{X}&=X-\hat{X}=X-(AZ+B) \tag{2} \\ \end{align*} X^X~=AZ+B=X−X^=X−(AZ+B)(1)(2)
线性最小方差估计(Linear Minimum Mean Squared Error, LMMSE)
线性最小方差估计准则与最小方差一致,是使估计均方误差集合均值最小的估计[1]。常简记做线性最小方差(LMMSE)[2]。
代价函数为:
J = E [ X ~ T X ~ ] = T r ( E [ X ~ X ~ T ] ) = E [ ( X − ( A Z + B ) ) T ( X − ( A Z + B ) ) ] = m i n (3) J =E[\tilde{X}^{T}\tilde{X}]=Tr(E[\tilde{X}\tilde{X}^{T}])=E[(X-(AZ+B))^{T}(X-(AZ+B))]=min \tag{3} J=E[X~TX~]=Tr(E[X~X~T])=E[(X−(AZ+B))T(X−(AZ+B))]=min(3)
最小化(Minimizing)
最小方差估计(MMSE) 在最小化的过程不需要已知 X X X与 Z Z Z的条件概率密度函数 P ( X ∣ Z ) P(X|Z) P(X∣Z),联合概率密度函数 P ( X , Z ) P(X,Z) P(X,Z)和条件数学期望 E [ X ∣ Z ] E[X|Z] E[X∣Z] 。注意:这种苛刻的的先验条件,是此方法在工程上的应用受到很大限制[1]。线性最小方差估计(LMMSE) 仅需要已知随机向量 X X X与 Z Z Z的一阶距和二阶矩,即 E [ X ] E[X] E[X], E [ Z ] E[Z] E[Z], V a r ( X ) Var(X) Var(X), V a r ( Z ) Var(Z) Var(Z), C o v ( X , Z ) Cov(X ,Z) Cov(X,Z), C o v ( X , Z ) Cov(X, Z) Cov(X,Z),这使得工程应用的难度显著降低,其最小化推导过程如下:
J = E [ ( X − ( A Z + B ) ) T ( X − ( A Z + B ) ) ] \begin{align*} J&=E[(X-(AZ+B))^{T}(X-(AZ+B))] \tag{4}\\ \end{align*} J=E[(X−(AZ+B))T(X−(AZ+B))](4)
先对式(2)的 J J J对 B B B求偏导,并取极小:
∂ J ∂ B = 0 ∂ J ∂ ( X − ( A Z + B ) ) ∂ ( X − ( A Z + B ) ) ∂ B = 0 − 2 E [ X − ( A Z + B ) ) ] = 0 E [ X − A Z − B ] = 0 E [ X ] − A E [ Z ] − B = 0 B = E [ X ] − A E [ Z ] \begin{align*} \frac{\partial J}{\partial B}&=0\\ \frac{\partial J}{\partial (X-(AZ+B))}\frac{\partial (X-(AZ+B))}{\partial B}&=0\\ -2E[X-(AZ+B))] &= 0\\ E[X-AZ-B] &= 0\\ E[X]-AE[Z]-B &= 0\\ B &= E[X]-AE[Z] \tag{5}\\ \end{align*} ∂B∂J∂(X−(AZ+B))∂J∂B∂(X−(AZ+B))−2E[X−(AZ+B))]E[X−AZ−B]E[X]−AE[Z]−BB=0=0=0=0=0=E[X]−AE[Z](5)
再对式(4)的 J J J对 A A A求偏导,代入式(5)中的 B B B并取极小:
∂ J ∂ A = 0 ∂ J ∂ ( X − ( A Z + B ) ) ∂ ( X − ( A Z + B ) ) ∂ A = 0 − 2 E [ ( X − ( A Z + B ) ) Z T ] = 0 E [ ( X − B ) Z T ] − E [ A Z Z T ] = 0 E [ X Z T ] − B E [ Z T ] − E [ A Z Z T ] = 0 E [ X Z T ] − ( E [ X ] − A E [ Z ] ) E [ Z T ] − E [ A Z Z T ] = 0 E [ X Z T ] − E [ X ] E [ Z T ] + A E [ Z ] E [ Z T ] − A E [ Z Z T ] = 0 ( E [ X Z T ] − E [ X ] E [ Z T ] ) − A ( ( E [ Z Z T ] − E [ Z ] E [ Z T ] ) = 0 E [ ( X − E [ X ] ) ( Z − E [ Z ] ) T ] − A E [ ( Z − E [ Z ] ) ( Z − E [ Z ] ) T ] = 0 C o v ( X , Z ) − A V a r ( Z ) = 0 A = C o v ( X , Z ) V a r ( Z ) − 1 \begin{align*} \frac{\partial J}{\partial A}&=0\\ \frac{\partial J}{\partial (X-(AZ+B))}\frac{\partial (X-(AZ+B))}{\partial A}&=0 \\ -2E[(X-(AZ+B))Z^{T}] &= 0 \tag{6} \\ E[(X-B)Z^{T}] -E[AZZ^{T}]&= 0\\ E[XZ^{T}]-BE[Z^{T}] -E[AZZ^{T}]&= 0\\ E[XZ^{T}]-(E[X]-AE[Z])E[Z^{T}] -E[AZZ^{T}]&= 0\\ E[XZ^{T}]-E[X]E[Z^{T}]+AE[Z]E[Z^{T}]-AE[ZZ^{T}]&= 0\\ (E[XZ^{T}]-E[X]E[Z^{T}])-A((E[ZZ^{T}]-E[Z]E[Z^{T}])&= 0\\ E[(X-E[X])(Z-E[Z])^{T}]-AE[(Z-E[Z])(Z-E[Z])^{T}]&= 0\\ Cov(X,Z)-AVar(Z) &= 0\\ A &= Cov(X,Z)Var(Z)^{-1} \tag{7}\\ \end{align*} ∂A∂J∂(X−(AZ+B))∂J∂A∂(X−(AZ+B))−2E[(X−(AZ+B))ZT]E[(X−B)ZT]−E[AZZT]E[XZT]−BE[ZT]−E[AZZT]E[XZT]−(E[X]−AE[Z])E[ZT]−E[AZZT]E[XZT]−E[X]E[ZT]+AE[Z]E[ZT]−AE[ZZT](E[XZT]−E[X]E[ZT])−A((E[ZZT]−E[Z]E[ZT])E[(X−E[X])(Z−E[Z])T]−AE[(Z−E[Z])(Z−E[Z])T]Cov(X,Z)−AVar(Z)A=0=0=0=0=0=0=0=0=0=0=Cov(X,Z)Var(Z)−1(6)(7)
将 A A A代入式(5),并求出 B B B:
B = E [ X ] − A E [ Z ] = E [ X ] − C o v ( X , Z ) V a r ( Z ) − 1 E [ Z ] \begin{align*} B &= E[X]-AE[Z] \\ &= E[X]-Cov(X,Z)Var(Z)^{-1}E[Z] \tag{8}\\ \end{align*} B=E[X]−AE[Z]=E[X]−Cov(X,Z)Var(Z)−1E[Z](8)
将 A A A和 B B B代入式(1),并求出 X ^ \hat{X} X^:
X ^ = A Z + B = C o v ( X , Z ) V a r ( Z ) − 1 Z + E [ X ] − C o v ( X , Z ) V a r ( Z ) − 1 E [ Z ] = E [ X ] + C o v ( X , Z ) V a r ( Z ) − 1 ( Z − E [ Z ] ) \begin{align*} \hat{X}&=AZ+B\\ &=Cov(X,Z)Var(Z)^{-1}Z+E[X]-Cov(X,Z)Var(Z)^{-1}E[Z]\\ &=E[X]+Cov(X,Z)Var(Z)^{-1}(Z-E[Z]) \tag{9}\\ \end{align*} X^=AZ+B=Cov(X,Z)Var(Z)−1Z+E[X]−Cov(X,Z)Var(Z)−1E[Z]=E[X]+Cov(X,Z)Var(Z)−1(Z−E[Z])(9)
无偏性(Unbiased)
线性最小方差估计 X ^ \hat{X} X^的数学期望为:
E [ X ^ ] = E [ E [ X ] + C o v ( X , Z ) V a r ( Z ) − 1 ( Z − E [ Z ] ) ] = E [ X ] + C o v ( X , Z ) V a r ( Z ) − 1 E [ Z − E [ Z ] ] = E [ X ] \begin{align*} E[\hat {X}] &= E[E[X]+Cov(X,Z)Var(Z)^{-1}(Z-E[Z])] \\ &= E[X]+Cov(X,Z)Var(Z)^{-1}E[Z-E[Z]] \\ &= E[X] \tag{10}\\ \end{align*} E[X^]=E[E[X]+Cov(X,Z)Var(Z)−1(Z−E[Z])]=E[X]+Cov(X,Z)Var(Z)−1E[Z−E[Z]]=E[X](10)
显然,线性最小方差估计是无偏的,有:
E [ X ~ ] = E [ X − X ^ ] = 0 (11) E[\tilde{X}]=E[X-\hat{X}]=0 \tag{11} E[X~]=E[X−X^]=0(11)
协方差矩阵(Covariance Matrix)
线性最小方差估计的估计误差 X ~ \tilde{X} X~的协方差矩阵为[1]:
E [ X ~ X ~ T ] = E [ ( X − X ^ ) ( X − X ^ ) T ] = E [ ( X − ( E [ X ] + C o v ( X , Z ) V a r ( Z ) − 1 ( Z − E [ Z ] ) ) ( X − ( E [ X ] + C o v ( X , Z ) V a r ( Z ) − 1 ( Z − E [ Z ] ) ) T ] = E [ ( ( X − E [ X ] ) − C o v ( X , Z ) V a r ( Z ) − 1 ( Z − E [ Z ] ) ) ( ( X − E [ X ] ) − C o v ( X , Z ) V a r ( Z ) − 1 ( Z − E [ Z ] ) ) T ] = C 1 + C 2 + C 3 \begin{align*} E[\tilde{X}\tilde{X}^{T}] &= E[(X-\hat{X})(X-\hat{X})^{T}] \\ &= E[(X-(E[X]+Cov(X,Z)Var(Z)^{-1}(Z-E[Z]))(X-(E[X]+Cov(X,Z)Var(Z)^{-1}(Z-E[Z]))^{T}] \\ &= E[((X-E[X])-Cov(X,Z)Var(Z)^{-1}(Z-E[Z]))((X-E[X])-Cov(X,Z)Var(Z)^{-1}(Z-E[Z]))^{T}] \\ &= C_{1}+C{2}+C{3} \tag{12}\\ \end{align*} E[X~X~T]=E[(X−X^)(X−X^)T]=E[(X−(E[X]+Cov(X,Z)Var(Z)−1(Z−E[Z]))(X−(E[X]+Cov(X,Z)Var(Z)−1(Z−E[Z]))T]=E[((X−E[X])−Cov(X,Z)Var(Z)−1(Z−E[Z]))((X−E[X])−Cov(X,Z)Var(Z)−1(Z−E[Z]))T]=C1+C2+C3(12)
其中,
C 1 = E [ ( X − E [ X ] ) ( X − E [ X ] ) T ] = V a r ( x ) \begin{align*} C_{1} &= E[(X-E[X])(X-E[X])^{T}] \\ &= Var(x) \tag{13} \\ \end{align*} C1=E[(X−E[X])(X−E[X])T]=Var(x)(13)
C 2 = − E [ ( X − E [ X ] ) ( Z − E [ Z ] ) T ] [ V a r ( Z ) − 1 ] T C o v ( X , Z ) T − C o v ( X , Z ) V a r ( Z ) − 1 E [ ( Z − E [ Z ] ) ) ( X − E [ X ] ) T ] = − C o v ( X , Z ) V a r ( Z ) − 1 C o v ( Z , X ) − C o v ( X , Z ) V a r ( Z ) − 1 C o v ( Z , X ) = − 2 C o v ( X , Z ) V a r ( Z ) − 1 C o v ( Z , X ) \begin{align*} C_{2} &= - E[(X-E[X])(Z-E[Z])^{T}][Var(Z)^{-1}]^{T}Cov(X,Z)^{T} \\ &- Cov(X,Z)Var(Z)^{-1}E[(Z-E[Z]))(X-E[X])^{T}] \\ &= - Cov(X,Z)Var(Z)^{-1}Cov(Z,X) - Cov(X,Z)Var(Z)^{-1}Cov(Z,X) \\ &= - 2Cov(X,Z)Var(Z)^{-1}Cov(Z,X) \tag{14} \\ \end{align*} C2=−E[(X−E[X])(Z−E[Z])T][Var(Z)−1]TCov(X,Z)T−Cov(X,Z)Var(Z)−1E[(Z−E[Z]))(X−E[X])T]=−Cov(X,Z)Var(Z)−1Cov(Z,X)−Cov(X,Z)Var(Z)−1Cov(Z,X)=−2Cov(X,Z)Var(Z)−1Cov(Z,X)(14)
C 3 = C o v ( X , Z ) V a r ( Z ) − 1 E [ ( Z − E [ Z ] ) ( Z − E [ Z ] ) T ] [ V a r ( Z ) − 1 ] T C o v ( X , Z ) T = C o v ( X , Z ) V a r ( Z ) − 1 E [ ( Z − E [ Z ] ) ( Z − E [ Z ] ) T ] V a r ( Z ) − 1 C o v ( Z , X ) = C o v ( X , Z ) V a r ( Z ) − 1 V a r ( Z ) V a r ( Z ) − 1 C o v ( Z , X ) = C o v ( X , Z ) V a r ( Z ) − 1 C o v ( Z , X ) \begin{align*} C_{3} &= Cov(X,Z)Var(Z)^{-1}E[(Z-E[Z])(Z-E[Z])^{T}][Var(Z)^{-1}]^{T}Cov(X,Z)^{T} \\ &= Cov(X,Z)Var(Z)^{-1}E[(Z-E[Z])(Z-E[Z])^{T}]Var(Z)^{-1}Cov(Z,X) \\ &= Cov(X,Z)Var(Z)^{-1}Var(Z)Var(Z)^{-1}Cov(Z,X) \\ &= Cov(X,Z)Var(Z)^{-1}Cov(Z,X) \tag{15} \\ \end{align*} C3=Cov(X,Z)Var(Z)−1E[(Z−E[Z])(Z−E[Z])T][Var(Z)−1]TCov(X,Z)T=Cov(X,Z)Var(Z)−1E[(Z−E[Z])(Z−E[Z])T]Var(Z)−1Cov(Z,X)=Cov(X,Z)Var(Z)−1Var(Z)Var(Z)−1Cov(Z,X)=Cov(X,Z)Var(Z)−1Cov(Z,X)(15)
式(13)(14)(15)代入式(13),得:
E [ X ~ X ~ T ] = V a r ( X ) − C o v ( X , Z ) V a r ( Z ) − 1 C o v ( Z , X ) \begin{align*} E[\tilde{X}\tilde{X}^{T}] &=Var(X)- Cov(X,Z)Var(Z)^{-1}Cov(Z,X) \tag{16}\\ \end{align*} E[X~X~T]=Var(X)−Cov(X,Z)Var(Z)−1Cov(Z,X)(16)
线性变换(Linear Transformation)
由式(2)(6)(11),得
E [ ( X − ( A Z + B ) ) Z T ] = E [ X ~ Z T ] = 0 E [ ( X − ( A Z + B ) ) ] = E [ X ~ ] = 0 \begin{align*} E[(X-(AZ+B))Z^{T}] &= E[\tilde{X}Z^{T}] = 0 \tag{17}\\ E[(X-(AZ+B))] &=E[\tilde{X}] = 0 \tag{18}\\ \end{align*} E[(X−(AZ+B))ZT]E[(X−(AZ+B))]=E[X~ZT]=0=E[X~]=0(17)(18)
根据正交投影定理[2],从几何上来看,估计量 X ^ \hat{X} X^是 X X X在由 A A A和 B B B确定关于 Z Z Z的线性空间上( Z Z Z和 B B B所在平面)的投影。估计误差的均值为0,即垂线的长度为0, X ^ \hat{X} X^与 X X X重合,线性最小方差估计即为无偏估计。线性转换的过程可以理解为:由于 Z Z Z作为观测量无法调整,但是 B B B可以被调整使得 Z Z Z和 B B B所在平面令 X X X落在该平面, A A A、 B B B和 Z Z Z则确定该平面上的投影向量,即得到最优无偏估计。
参考文献
[1] 最优估计准则与方法(1)最小方差估计(MMSE)_学习笔记
https://blog.csdn.net/jimmychao1982/article/details/149478176
[2] 《最优估计理论》,刘胜,张红梅著,2011,高等教育出版社。
[3] 3-2 正交定理, Yandld
https://www.bilibili.com/video/BV1wj411H7j7?t=2.7