论文阅读笔记:Denoising Diffusion Probabilistic Models (2)

发布于:2025-03-22 ⋅ 阅读:(84) ⋅ 点赞:(0)

论文阅读笔记:Denoising Diffusion Probabilistic Models (1)

3、论文推理过程

扩散模型的流程如下图所示,可以看出 q ( x 0 , 1 , 2 ⋯   , T − 1 , T ) q(x^{0,1,2\cdots ,T-1, T}) q(x0,1,2,T1,T)为正向加噪音过程, p ( x 0 , 1 , 2 ⋯   , T − 1 , T ) p(x^{0,1,2\cdots ,T-1, T}) p(x0,1,2,T1,T)为逆向去噪音过程。可以看出,逆向去噪的末端得到的图上还散布一些噪点。
请添加图片描述

3.1、名词解释

q ( x 0 ) q(x^0) q(x0) x 0 x^0 x0 表示数据集的图像分布,例如在使用MNIST数据集时, x 0 x^0 x0就表示MNIST数据集中的图像,而 q ( x 0 ) q(x^0) q(x0)就表示数据集MNIST中数据集的分布情况。
p ( x T ) p(x^T) p(xT) x T x^T xT表示 x 0 x^0 x0的加噪结果, x T x^T xT是逆向去噪的起点,因此 p ( x T ) p(x^T) p(xT)是去噪起点的分布情况。

3.2、推理过程

正向加噪过程满足马尔可夫性质,因此有公式1。

q ( x 0 , 1 , 2 ⋯   , T − 1 , T ) = q ( x 0 ) ⋅ ∏ t = 1 T q ( x t ∣ x t − 1 ) = q ( x 0 ) ⋅ q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) … q ( x T ∣ x T − 1 ) . q ( x 1 , 2 ⋯ T ∣ x 0 ) = q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) … q ( x T ∣ x T − 1 ) ) . \begin{equation} \begin{split} q(x^{0,1,2\cdots,T-1,T})&=q(x^0)\cdot \prod_{t=1}^{T}{q(x^t|x^{t-1})} \\ &=q(x^0)\cdot q(x^1|x^0)\cdot q(x^2|x^1)\dots q(x^T|x^{T-1}). \\ q(x^{1,2 \cdots T}|x^0)&=q(x^1|x^0)\cdot q(x^2|x^1)\dots q(x^T|x^{T-1})). \end{split} \end{equation} q(x0,1,2,T1,T)q(x1,2Tx0)=q(x0)t=1Tq(xtxt1)=q(x0)q(x1x0)q(x2x1)q(xTxT1).=q(x1x0)q(x2x1)q(xTxT1)).

逆向去噪过程如公式2。

p θ ( x 0 , 1 , 2 ⋯   , T − 1 , T ) = p θ ( x T ) ⋅ ∏ t = 1 T p θ ( x t − 1 ∣ x t ) = p θ ( x T ) ⋅ p θ ( x T − 1 ∣ x T ) ⋅ p θ ( x T − 2 ∣ x T − 1 ) … p θ ( x 0 ∣ x 1 ) . \begin{equation} \begin{split} p_{\theta}(x^{0,1,2\cdots,T-1,T})&=p_{\theta}(x^T)\cdot \prod_{t=1}^{T}{p_{\theta}(x^{t-1}|x^{t})} \\ &=p_{\theta}(x^T)\cdot p_{\theta}(x^{T-1}|x^T)\cdot p_{\theta}(x^{T-2}|x^{T-1})\dots p_{\theta}(x^{0}|x^{1}). \end{split} \end{equation} pθ(x0,1,2,T1,T)=pθ(xT)t=1Tpθ(xt1xt)=pθ(xT)pθ(xT1xT)pθ(xT2xT1)pθ(x0x1).
公式2中的参数 θ \theta θ就是深度学习模型中需要学习的参数。为了方便,省略公式2中的 θ \theta θ,因此公式2被重写为公式3。
p ( x 0 , 1 , 2 ⋯   , T − 1 , T ) = p ( x T ) ⋅ ∏ t = 1 T p ( x t − 1 ∣ x t ) = p ( x T ) ⋅ p ( x T − 1 ∣ x T ) ⋅ p ( x T − 2 ∣ x T − 1 ) … p ( x 0 ∣ x 1 ) \begin{equation} \begin{split} p(x^{0,1,2\cdots,T-1,T})&=p(x^T)\cdot \prod_{t=1}^{T}{p(x^{t-1}|x^{t})} \\ &=p(x^T)\cdot p(x^{T-1}|x^T)\cdot p(x^{T-2}|x^{T-1})\dots p(x^{0}|x^{1}) \end{split} \end{equation} p(x0,1,2,T1,T)=p(xT)t=1Tp(xt1xt)=p(xT)p(xT1xT)p(xT2xT1)p(x0x1)

逆向去噪的目标是使得其终点与正向加噪的起点相同。也就是使得 p ( x 0 ) p(x^0) p(x0)最大,即使得 逆向去噪过程为 x 0 x^0 x0的概率最大。

p ( x 0 ) = ∫ p ( x 0 , x 1 ) d x 1 ( 联合分布概率公式 ) = ∫ p ( x 1 ) ⋅ p ( x 0 ∣ x 1 ) d x 1 ( 贝叶斯概率公式 ) = ∫ ( ∫ p ( x 1 , x 2 ) d x 2 ) ⋅ p ( x 0 ∣ x 1 ) d x 1 ( 积分套积分 ) = ∬ p ( x 2 ) ⋅ p ( x 1 ∣ x 2 ) ⋅ p ( x 0 ∣ x 1 ) d x 1 d x 2 ( 改写为二重积分 ) = ⋮ = ∫ ∫ ⋯ ∫ p ( x T ) ⋅ p ( x T − 1 ∣ x T ) ⋅ p ( x T − 2 ∣ x − 1 ) ⋯ p ( x 0 ∣ x 1 ) ⋅ d x 1 d x 2 ⋯ d x T = ∫ p ( x 0 , 1 , 2 ⋯ T ) d x 1 , 2 ⋯ T ( T − 1 重积分 ) = ∫ d x 1 , 2 ⋯ T ⋅ p ( x 0 , 1 , 2 ⋯ T ) ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) q ( x 1 , 2 ⋯ T ∣ x 0 ) = ∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ p ( x 0 , 1 , 2 ⋯ T ) q ( x 1 , 2 ⋯ T ∣ x 0 ) = ∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ p ( x T ) ⋅ p ( x T − 1 ∣ x T ) ⋅ p ( x T − 2 ∣ x T − 1 ) … p ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) … q ( x T ∣ x T − 1 ) = ∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ p ( x T ) ⋅ p ( x T − 1 ∣ x T ) ⋅ p ( x T − 2 ∣ x T − 1 ) … p ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) … q ( x T ∣ x T − 1 ) = ∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ p ( x T ) ⋅ ∏ t = 1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) = E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) p ( x T ) ⋅ ∏ t = 1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ( 改写为期望的形式 ) \begin{equation} \begin{split} p(x^0)&=\int p(x^0,x^1)dx^{1} (联合分布概率公式)\\ &=\int p(x^1)\cdot p(x^0|x^1)dx^1 (贝叶斯概率公式) \\ &=\int \Big(\int p(x^1,x^2)dx^2 \Big) \cdot p(x^0|x^1)dx^1 (积分套积分)\\ &=\iint p(x^2)\cdot p(x^1|x^2) \cdot p(x^0|x^1)dx^1 dx^2(改写为二重积分)\\ &= \vdots \\ &= \int \int \cdots \int p(x^T)\cdot p(x^{T-1}|x^{T})\cdot p(x^{T-2}|x^{-1})\cdots p(x^0|x^1) \cdot dx^1 dx^2 \cdots dx^T \\ &= \int p(x^{0,1,2 \cdots T})dx^{1,2\cdots T} (T-1重积分) \\ &= \int dx^{1,2\cdots T} \cdot p(x^{0,1,2 \cdots T}) \cdot \frac{q(x^{1,2 \cdots T}| x^0)}{q(x^{1,2 \cdots T}|x^0)} \\ &= \int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot \frac{ p(x^{0,1,2 \cdots T}) }{q(x^{1,2 \cdots T}|x^0)} \\ &= \int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot \frac{ p(x^T)\cdot p(x^{T-1}|x^T)\cdot p(x^{T-2}|x^{T-1})\dots p(x^{0}|x^{1})}{q(x^1|x^0)\cdot q(x^2|x^1)\dots q(x^T|x^{T-1})} \\ &= \int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot p(x^T)\cdot \frac{ p(x^{T-1}|x^T)\cdot p(x^{T-2}|x^{T-1})\dots p(x^{0}|x^{1})}{q(x^1|x^0)\cdot q(x^2|x^1)\dots q(x^T|x^{T-1})} \\ &= \int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})} \\ &= E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})} (改写为期望的形式)\\ \end{split} \end{equation} p(x0)=p(x0,x1)dx1(联合分布概率公式)=p(x1)p(x0x1)dx1(贝叶斯概率公式)=(p(x1,x2)dx2)p(x0x1)dx1(积分套积分)=p(x2)p(x1x2)p(x0x1)dx1dx2(改写为二重积分)==∫∫p(xT)p(xT1xT)p(xT2x1)p(x0x1)dx1dx2dxT=p(x0,1,2T)dx1,2T(T1重积分)=dx1,2Tp(x0,1,2T)q(x1,2Tx0)q(x1,2Tx0)=dx1,2Tq(x1,2Tx0)q(x1,2Tx0)p(x0,1,2T)=dx1,2Tq(x1,2Tx0)q(x1x0)q(x2x1)q(xTxT1)p(xT)p(xT1xT)p(xT2xT1)p(x0x1)=dx1,2Tq(x1,2Tx0)p(xT)q(x1x0)q(x2x1)q(xTxT1)p(xT1xT)p(xT2xT1)p(x0x1)=dx1,2Tq(x1,2Tx0)p(xT)t=1Tq(xtxt1)p(xt1xt)=Ex1,2,Tq(x1,2Tx0)p(xT)t=1Tq(xtxt1)p(xt1xt)(改写为期望的形式)
因此公式3中的参数 θ \theta θ应满足
θ = a r g max θ p θ ( x 0 ) . \begin{equation} \theta= arg \underset {\theta}{\text{max}} p_{\theta}(x^0). \end{equation} θ=argθmaxpθ(x0).
公式4是对数据集中的一张图片进行求解,然而数据集中通常是有成千上万张图像的。假设数据集中有 N N N张图像,因此有公式6,其目的是求得一组参数 θ \theta θ,使得 L L L取得最大值。值得注意的是 q ( x 0 ) q(x^0) q(x0)表示数据集中每张图片被采样出来的概率。
为了防止边缘效应,在本文中令 p ( x 1 ∣ x 0 ) = q ( x 1 ∣ x 0 ) p(x^1|x^{0})=q(x^1|x^{0}) p(x1x0)=q(x1x0).
L : = − l o g [ p ( x 0 ) ] = − l o g [ E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) p ( x T ) ⋅ ∏ t = 1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] ≤ − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g [ p ( x T ) ⋅ ∏ t = 1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] ) = − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g [ p ( x T ) ] + ∑ t = 1 T l o g [ p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] ) = − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g [ p ( x T ) ] + l o g [ p ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) ] + ∑ t = 2 T l o g [ p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] ) = − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g [ p ( x T ) ] + l o g [ p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) ⏟ p ( x 1 ∣ x 0 ) = q ( x 1 ∣ x 0 ) ] + ∑ t = 2 T l o g [ p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 , x 0 ) ⏟ q ( x t ∣ x t − 1 ) = q ( x t ∣ x t − 1 , x 0 ) ] ) = − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g [ p ( x T ) ] + l o g [ p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) ] + ∑ t = 2 T l o g [ p ( x t − 1 ∣ x t ) q ( x t , x t − 1 , x 0 ) ⋅ q ( x t − 1 , x 0 ) ⋅ q ( x 0 ) q ( x 0 ) ⋅ q ( x t , x 0 ) q ( x t , x 0 ) ⏟ q ( x t ∣ x t − 1 , x 0 ) = q ( x t , x t − 1 , x 0 ) q ( x t − 1 , x 0 ) ] ) = − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g [ p ( x T ) ] + l o g [ p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) ] + ∑ t = 2 T l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ⋅ q ( x t − 1 , x 0 ) q ( x 0 ) ⋅ q ( x 0 ) q ( x t , x 0 ) ⏟ q ( x t , x t − 1 , x 0 ) = q ( x t , x 0 ) ⋅ q ( x t − 1 ∣ x t , x 0 ) ] ) = − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g [ p ( x T ) ] + l o g [ p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) ] + ∑ t = 2 T l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ⋅ q ( x t − 1 ∣ x 0 ) q ( x t ∣ x 0 ) ⏟ q ( x t − 1 , x 0 ) = q ( x 0 ) ⋅ q ( x t − 1 ∣ x 0 ) ; q ( x t , x 0 ) = q ( x 0 ) ⋅ q ( x t ∣ x 0 ) ] ) = − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g [ p ( x T ) ] + l o g [ p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) ] + ∑ t = 2 T l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] + ∑ t = 2 T l o g [ q ( x t − 1 ∣ x 0 ) q ( x t ∣ x 0 ) ] ) = − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g [ p ( x T ) ] + l o g [ p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) ] + ∑ t = 2 T l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] + l o g [ q ( x 1 ∣ x 0 ) q ( x 2 ∣ x 0 ) ⋅ q ( x 2 ∣ x 0 ) q ( x 3 ∣ x 0 ) ⋯ q ( x T − 1 ∣ x 0 ) q ( x T ∣ x 0 ) ] ) = − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g [ p ( x T ) ] + l o g [ p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) ] + ∑ t = 2 T l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] + l o g [ q ( x 1 ∣ x 0 ) q ( x T ∣ x 0 ) ] ) = − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g [ p ( x T ) q ( x T ∣ x 0 ) ] + ∑ t = 2 T l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] + l o g [ p ( x 1 ∣ x 0 ) ] ⏟ l o g [ p ( x T ) + l o g [ p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) ] ] + l o g [ q ( x 1 ∣ x 0 ) q ( x T ∣ x 0 ) ] = l o g [ p ( x T ) ⋅ p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) ⋅ q ( x 1 ∣ x 0 ) q ( x T ∣ x 0 ) ] ) = − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g [ p ( x T ) q ( x T ∣ x 0 ) ] ) − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( ∑ t = 2 T l o g [ p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] ) − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g [ p ( x 0 ∣ x 1 ) ] ) = E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g [ q ( x T ∣ x 0 ) p ( x T ) ] ) + E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( ∑ t = 2 T l o g [ q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ] ) − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g [ p ( x 0 ∣ x 1 ) ] ) = E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g [ q ( x T ∣ x 0 ) p ( x T ) ] ) ⏟ L 1 + E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( ∑ t = 2 T l o g [ q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ] ) ⏟ L 2 − l o g [ p ( x 0 ∣ x 1 ) ] ⏟ L 3 : 常数么 ? \begin{equation} \begin{split} L&:=- log\Big[p(x^0)\Big] \\ &= -log \Big[ E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}\Big] \\ & \leq -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \bigg( log [p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}]\bigg)\\ &= -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \bigg( log [p(x^T)]+\sum_{t=1}^{T} log \Big[ \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}\Big]\bigg)\\ &= -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \bigg( log [p(x^T)]+ log\Big[\frac{ p(x^{0}|x^1)}{q(x^1|x^{0})} \Big]+\sum_{t=2}^{T} log \Big[ \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})}\Big] \bigg)\\ &= -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \bigg( log [p(x^T)]+ log\Big[\frac{ p(x^{0}|x^1)}{\underbrace{ p(x^1|x^{0})}_{p(x^1|x^{0})=q(x^1|x^{0})}} \Big]+\sum_{t=2}^{T} log \Big[\underbrace{ \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1},x^0)}}_{q(x^t|x^{t-1})=q(x^t|x^{t-1},x^0)}\Big] \bigg)\\ &= -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg( log [p(x^T)]+ log\Big[\frac{ p(x^{0}|x^1)}{p(x^1|x^{0})} \Big]+\sum_{t=2}^{T} log \Big[\underbrace{ \frac{ p(x^{t-1}|x^t)}{q(x^t,x^{t-1},x^0)} \cdot q(x^{t-1}, x^0) \cdot \frac{q(x^0)}{q(x^0)}\cdot \frac{q(x^t,x^0)}{q(x^t,x^0)}}_{ q(x^t|x^{t-1},x^0)=\frac{q(x^t,x^{t-1},x^0)}{q(x^{t-1},x^0)}}\Big] \Bigg)\\ &= -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg( log [p(x^T)]+ log\Big[\frac{ p(x^{0}|x^1)}{p(x^1|x^{0})} \Big]+\sum_{t=2}^{T} log \Big[\underbrace{ \frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^t,x^0)} \cdot \frac{q(x^{t-1}, x^0) }{q(x^0)}\cdot \frac{ q(x^0)}{q(x^t,x^0)}}_{q(x^t,x^{t-1},x^0)= q(x^t,x^0) \cdot q(x^{t-1}|x^t,x^0)}\Big] \Bigg)\\ &= -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg( log [p(x^T)]+ log\Big[\frac{ p(x^{0}|x^1)}{p(x^1|x^{0})} \Big]+\sum_{t=2}^{T} log \Big[\underbrace{ \frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^t,x^0)} \cdot \frac{q(x^{t-1}| x^0) }{q(x^{t}|x^0)}}_{q(x^{t-1},x^0)=q(x^0) \cdot q(x^{t-1}|x^0) ; q(x^{t},x^0)=q(x^0) \cdot q(x^{t}|x^0)}\Big] \Bigg)\\ &= -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg( log [p(x^T)]+ log\Big[\frac{ p(x^{0}|x^1)}{p(x^1|x^{0})} \Big]+\sum_{t=2}^{T} log \Big[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^t,x^0)} \Big] + \sum_{t=2}^{T} log \Big[\frac{q(x^{t-1}| x^0) }{q(x^{t}|x^0)}\Big] \Bigg)\\ &= -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg( log [p(x^T)]+ log\Big[\frac{ p(x^{0}|x^1)}{p(x^1|x^{0})} \Big]+\sum_{t=2}^{T} log \Big[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^t,x^0)} \Big] + log \Big[\frac{q(x^{1}| x^0) }{q(x^{2}|x^0)} \cdot \frac{q(x^{2}| x^0) }{q(x^{3}|x^0)}\cdots \frac{q(x^{T-1}| x^0) }{q(x^{T}|x^0)}\Big] \Bigg)\\ &= -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg( log [p(x^T)]+ log\Big[\frac{ p(x^{0}|x^1)}{p(x^1|x^{0})} \Big]+\sum_{t=2}^{T} log \Big[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^t,x^0)} \Big] + log \Big[\frac{q(x^{1}| x^0) }{q(x^{T}|x^0)}\Big] \Bigg)\\ &= -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg(\underbrace{log \Big[\frac{p(x^T)}{q(x^{T}|x^0)}\Big]+\sum_{t=2}^{T} log \Big[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^t,x^0)} \Big] + log \Big[p(x^{1}|x^0)\Big] }_{log [p(x^T)+log\Big[\frac{ p(x^{0}|x^1)}{p(x^1|x^{0})} \Big]]+ log \Big[\frac{q(x^{1}| x^0) }{q(x^{T}|x^0)}\Big]=log\bigg[p(x^T) \cdot \frac{ p(x^{0}|x^1)}{\bcancel{p(x^1|x^{0})}} \cdot \frac{\bcancel{q(x^{1}| x^0) }}{q(x^{T}|x^0)} \bigg]}\Bigg)\\ &= -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg(log \Big[ \frac{ p(x^T)}{q(x^{T}|x^0)}\Big]\Bigg)-E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg(\sum_{t=2}^{T} log \Big[\frac{ p(x^{t-1}|x^t)}{q(x^{t-1}|x^t,x^0)} \Big]\Bigg) - E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg( log \Big[p(x^{0}|x^1)\Big] \Bigg)\\ &= E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg(log \Big[ \frac{q(x^{T}|x^0)}{ p(x^T)}\Big]\Bigg)+E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg(\sum_{t=2}^{T} log \Big[\frac{q(x^{t-1}|x^t,x^0)}{ p(x^{t-1}|x^t)} \Big]\Bigg) - E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg( log \Big[p(x^{0}|x^1)\Big] \Bigg)\\ &= \underbrace{E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg(log \Big[ \frac{q(x^{T}|x^0)}{ p(x^T)}\Big]\Bigg)}_{L_1}+\underbrace{E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg(\sum_{t=2}^{T} log \Big[\frac{q(x^{t-1}|x^t,x^0)}{ p(x^{t-1}|x^t)} \Big]\Bigg)}_{L_2} - \underbrace{log \Big[p(x^{0}|x^1)\Big]}_{L_3:常数么?} \\ \end{split} \end{equation} L:=log[p(x0)]=log[Ex1,2,Tq(x1,2Tx0)p(xT)t=1Tq(xtxt1)p(xt1xt)]Ex1,2,Tq(x1,2Tx0)(log[p(xT)t=1Tq(xtxt1)p(xt1xt)])=Ex1,2,Tq(x1,2Tx0)(log[p(xT)]+t=1Tlog[q(xtxt1)p(xt1xt)])=Ex1,2,Tq(x1,2Tx0)(log[p(xT)]+log[q(x1x0)p(x0x1)]+t=2Tlog[q(xtxt1)p(xt1xt)])=Ex1,2,Tq(x1,2Tx0)(log[p(xT)]+log[p(x1x0)=q(x1x0) p(x1x0)p(x0x1)]+t=2Tlog[q(xtxt1)=q(xtxt1,x0) q(xtxt1,x0)p(xt1xt)])=Ex1,2,Tq(x1,2Tx0)(log[p(xT)]+log[p(x1x0)p(x0x1)]+t=2Tlog[q(xtxt1,x0)=q(xt1,x0)q(xt,xt1,x0) q(xt,xt1,x0)p(xt1xt)q(xt1,x0)q(x0)q(x0)q(xt,x0)q(xt,x0)])=Ex1,2,Tq(x1,2Tx0)(log[p(xT)]+log[p(x1x0)p(x0x1)]+t=2Tlog[q(xt,xt1,x0)=q(xt,x0)q(xt1xt,x0) q(xt1xt,x0)p(xt1xt)q(x0)q(xt1,x0)q(xt,x0)q(x0)])=Ex1,2,Tq(x1,2Tx0)(log[p(xT)]+log[p(x1x0)p(x0x1)]+t=2Tlog[q(xt1,x0)=q(x0)q(xt1x0);q(xt,x0)=q(x0)q(xtx0) q(xt1xt,x0)p(xt1xt)q(xtx0)q(xt1x0)])=Ex1,2,Tq(x1,2Tx0)(log[p(xT)]+log[p(x1x0)p(x0x1)]+t=2Tlog[q(xt1xt,x0)p(xt1xt)]+t=2Tlog[q(xtx0)q(xt1x0)])=Ex1,2,Tq(x1,2Tx0)(log[p(xT)]+log[p(x1x0)p(x0x1)]+t=2Tlog[q(xt1xt,x0)p(xt1xt)]+log[q(x2x0)q(x1x0)q(x3x0)q(x2x0)q(xTx0)q(xT1x0)])=Ex1,2,Tq(x1,2Tx0)(log[p(xT)]+log[p(x1x0)p(x0x1)]+t=2Tlog[q(xt1xt,x0)p(xt1xt)]+log[q(xTx0)q(x1x0)])=Ex1,2,Tq(x1,2Tx0)(log[p(xT)+log[p(x1x0)p(x0x1)]]+log[q(xTx0)q(x1x0)]=log[p(xT)p(x1x0) p(x0x1)q(xTx0)q(x1x0) ] log[q(xTx0)p(xT)]+t=2Tlog[q(xt1xt,x0)p(xt1xt)]+log[p(x1x0)])=Ex1,2,Tq(x1,2Tx0)(log[q(xTx0)p(xT)])Ex1,2,Tq(x1,2Tx0)(t=2Tlog[q(xt1xt,x0)p(xt1xt)])Ex1,2,Tq(x1,2Tx0)(log[p(x0x1)])=Ex1,2,Tq(x1,2Tx0)(log[p(xT)q(xTx0)])+Ex1,2,Tq(x1,2Tx0)(t=2Tlog[p(xt1xt)q(xt1xt,x0)])Ex1,2,Tq(x1,2Tx0)(log[p(x0x1)])=L1 Ex1,2,Tq(x1,2Tx0)(log[p(xT)q(xTx0)])+L2 Ex1,2,Tq(x1,2Tx0)(t=2Tlog[p(xt1xt)q(xt1xt,x0)])L3:常数么? log[p(x0x1)]
可以看出 L L L总共氛围了3项,首先考虑第一项 L 1 L_1 L1
L 1 = E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g [ q ( x T ∣ x 0 ) p ( x T ) ] ) = ∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ l o g [ q ( x T ∣ x 0 ) p ( x T ) ] = ∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) q ( x T ∣ x 0 ) ⋅ q ( x T ∣ x 0 ) ⋅ l o g [ q ( x T ∣ x 0 ) p ( x T ) ] = ∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T − 1 ∣ x 0 , x T ) ⏟ q ( x 1 , 2 ⋯ T ∣ x 0 ) = q ( x T ∣ x 0 ) ⋅ q ( x 1 , 2 ⋯ T − 1 ∣ x 0 , x T ) ⋅ q ( x T ∣ x 0 ) ⋅ l o g [ q ( x T ∣ x 0 ) p ( x T ) ] = ∫ ( ∫ q ( x 1 , 2 ⋯ T − 1 ∣ x 0 , x T ) ⋅ ∏ k = 1 T − 1 d x k ⏟ 二重积分化为两个定积分相乘,并且 = 1 ) ⋅ q ( x T ∣ x 0 ) ⋅ l o g [ q ( x T ∣ x 0 ) p ( x T ) ] ⋅ d x T = ∫ q ( x T ∣ x 0 ) ⋅ l o g [ q ( x T ∣ x 0 ) p ( x T ) ] ⋅ d x T = E x T ∼ q ( x T ∣ x 0 ) l o g [ q ( x T ∣ x 0 ) p ( x T ) ] = K L ( q ( x T ∣ x 0 ) ∣ ∣ p ( x T ) ) \begin{equation} \begin{split} L_1&=E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg(log \Big[ \frac{q(x^{T}|x^0)}{ p(x^T)}\Big]\Bigg) \\ &=\int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot log \Big[ \frac{q(x^{T}|x^0)}{ p(x^T)}\Big] \\ &=\int dx^{1,2\cdots T} \cdot \frac{q(x^{1,2 \cdots T}| x^0)}{q(x^T|x^0)} \cdot q(x^T|x^0) \cdot log \Big[ \frac{q(x^{T}|x^0)}{ p(x^T)}\Big] \\ &=\int dx^{1,2\cdots T} \cdot \underbrace{ q(x^{1,2 \cdots T-1}| x^0, x^T) }_{q(x^{1,2 \cdots T}| x^0)=q(x^{T}|x^0) \cdot q(x^{1,2 \cdots T-1}| x^0, x^T)} \cdot q(x^T|x^0) \cdot log \Big[ \frac{q(x^{T}|x^0)}{ p(x^T)}\Big] \\ &=\int \Bigg( \underbrace{ \int q(x^{1,2 \cdots T-1}| x^0, x^T) \cdot \prod_{k=1}^{T-1} dx^k }_{二重积分化为两个定积分相乘,并且=1} \Bigg) \cdot q(x^T|x^0) \cdot log \Big[ \frac{q(x^{T}|x^0)}{ p(x^T)} \Big] \cdot dx^{T} \\ &=\int q(x^T|x^0) \cdot log \Big[ \frac{q(x^{T}|x^0)}{ p(x^T)} \Big] \cdot dx^{T} \\ &=E_{x^T\sim q(x^T|x^0)} log \Big[ \frac{q(x^{T}|x^0)}{ p(x^T)} \Big]\\ &= KL\Big(q(x^T|x^0)||p(x^T)\Big) \end{split} \end{equation} L1=Ex1,2,Tq(x1,2Tx0)(log[p(xT)q(xTx0)])=dx1,2Tq(x1,2Tx0)log[p(xT)q(xTx0)]=dx1,2Tq(xTx0)q(x1,2Tx0)q(xTx0)log[p(xT)q(xTx0)]=dx1,2Tq(x1,2Tx0)=q(xTx0)q(x1,2T1x0,xT) q(x1,2T1x0,xT)q(xTx0)log[p(xT)q(xTx0)]=(二重积分化为两个定积分相乘,并且=1 q(x1,2T1x0,xT)k=1T1dxk)q(xTx0)log[p(xT)q(xTx0)]dxT=q(xTx0)log[p(xT)q(xTx0)]dxT=ExTq(xTx0)log[p(xT)q(xTx0)]=KL(q(xTx0)∣∣p(xT))
接着考虑第二项 L 2 L_2 L2

L 2 = E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( ∑ t = 2 T l o g [ q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ] ) = ∑ t = 2 T E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g [ q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ] ) = ∑ t = 2 T ( ∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ l o g [ q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ] ) = ∑ t = 2 T ( ∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) q ( x t − 1 ∣ x t , x 0 ) ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g [ q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ] ) = ∑ t = 2 T ( ∫ d x 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) q ( x 0 ) ⏟ q ( x 0 , 1 , 2 ⋯ T ) = q ( x 0 ) ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ q ( x t , x 0 ) q ( x t , x t − 1 , x 0 ) ⏟ q ( x t , x t − 1 , x 0 ) = q ( x t , x 0 ) ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g [ q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ] ) = ∑ t = 2 T ( ∫ d x 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) q ( x 0 ) ⋅ q ( x t , x 0 ) q ( x t − 1 , x 0 ) ⋅ q ( x t ∣ x t − 1 , x 0 ) ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g [ q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ] ) = ∑ t = 2 T ( ∫ [ ∫ q ( x 0 , 1 , 2 ⋯ T ) q ( x 0 ) ⋅ q ( x t , x 0 ) q ( x t − 1 , x 0 ) ⋅ q ( x t ∣ x t − 1 , x 0 ) ∏ k ≥ 1 , k ≠ t − 1 d x k ] ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g [ q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) d x t − 1 ] ) = ∑ t = 2 T ( ∫ [ ∫ q ( x 0 , 1 , 2 ⋯ T ) q ( x t − 1 , x 0 ) ⋅ q ( x t , x 0 ) q ( x 0 ) ⋅ q ( x t ∣ x t − 1 , x 0 ) ∏ k ≥ 1 , k ≠ t − 1 d x k ] ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g [ q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) d x t − 1 ] ) = ∑ t = 2 T ( ∫ [ ∫ q ( x k : k ≥ 1 , k ≠ t − 1 ∣ x t − 1 , x 0 ) ⏟ q ( x 0 ; T ) = q ( x t − 1 , x 0 ) ⋅ q ( x k : k ≥ 1 , k ≠ t − 1 ∣ x t − 1 , x 0 ) ⋅ q ( x t ∣ x 0 ) q ( x t ∣ x t − 1 , x 0 ) ⏟ q ( x t , x 0 ) = q ( x 0 ) ⋅ q ( x t ∣ x 0 ) ∏ k ≥ 1 , k ≠ t − 1 d x k ] ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g [ q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) d x t − 1 ] ) = ∑ t = 2 T ( ∫ [ ∫ q ( x k : k ≥ 1 , k ≠ t − 1 ∣ x t − 1 , x 0 ) ⋅ q ( x t ∣ x 0 ) q ( x t ∣ x t − 1 , x 0 ) ⏟ = 1 ∏ k ≥ 1 , k ≠ t − 1 d x k ] ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g [ q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) d x t − 1 ] ) = ∑ t = 2 T ( ∫ [ ∫ q ( x k : k ≥ 1 , k ≠ t − 1 ∣ x t − 1 , x 0 ) ⋅ ∏ k ≥ 1 , k ≠ t − 1 d x k ] ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g [ q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) d x t − 1 ] ) = ∑ t = 2 T ( ∫ [ ∫ q ( x k : k ≥ 1 , k ≠ t − 1 ∣ x t − 1 , x 0 ) ⋅ ∏ k ≥ 1 , k ≠ t − 1 d x k ⏟ = 1 ] ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g [ q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) d x t − 1 ] ) = ∑ t = 2 T ( ∫ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g [ q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) d x t − 1 ] ) = ∑ t = 2 T ( E x t − 1 ∼ q ( x t − 1 ∣ x t , x 0 ) l o g [ q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ] ) = ∑ t = 2 T K L ( q ( x t − 1 ∣ x t , x 0 ) ∣ ∣ p ( x t − 1 ∣ x t ) ) \begin{equation} \begin{split} L_2&=E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg(\sum_{t=2}^{T} log \Big[\frac{q(x^{t-1}|x^t,x^0)}{ p(x^{t-1}|x^t)} \Big]\Bigg)\\ &=\sum_{t=2}^{T} E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg(log \Big[\frac{q(x^{t-1}|x^t,x^0)}{ p(x^{t-1}|x^t)} \Big]\Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot log \Big[\frac{q(x^{t-1}|x^t,x^0)}{ p(x^{t-1}|x^t)} \Big] \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int dx^{1,2\cdots T} \cdot \frac{ q(x^{1,2 \cdots T}| x^0)}{q(x^{t-1}|x^t,x^0)} \cdot q(x^{t-1}|x^t,x^0) \cdot log \Big[\frac{q(x^{t-1}|x^t,x^0)}{ p(x^{t-1}|x^t)} \Big] \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int dx^{1,2\cdots T} \cdot \underbrace{ \frac{q(x^{0,1,2\cdots T})}{q(x^0)}}_{q(x^{0,1,2\cdots T})=q(x^0)\cdot q(x^{1,2 \cdots T}| x^0)} \cdot \underbrace{ \frac{q(x^t,x^0)}{q(x^t,x^{t-1},x^0)}}_{q(x^t,x^{t-1},x^0)=q(x^t,x^0)\cdot q(x^{t-1}|x^t,x^0)} \cdot q(x^{t-1}|x^t,x^0) \cdot log \Big[\frac{q(x^{t-1}|x^t,x^0)}{ p(x^{t-1}|x^t)} \Big] \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int dx^{1,2\cdots T} \cdot \frac{q(x^{0,1,2\cdots T})}{q(x^0)}\cdot \frac{q(x^t,x^0)}{q(x^{t-1},x^0)\cdot q(x^t|x^{t-1},x^0)} \cdot q(x^{t-1}|x^t,x^0) \cdot log \Big[\frac{q(x^{t-1}|x^t,x^0)}{ p(x^{t-1}|x^t)} \Big] \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int \bigg[ \int \frac{q(x^{0,1,2\cdots T})}{q(x^0)}\cdot \frac{q(x^t,x^0)}{q(x^{t-1},x^0)\cdot q(x^t|x^{t-1},x^0)} \prod_{k\geq1 ,k\neq t-1} dx^k \bigg] \cdot q(x^{t-1}|x^t,x^0) \cdot log \Big[\frac{q(x^{t-1}|x^t,x^0)}{ p(x^{t-1}|x^t)} dx^{t-1} \Big] \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int \bigg[ \int \frac{q(x^{0,1,2\cdots T})}{q(x^{t-1},x^0)}\cdot \frac{q(x^t,x^0)}{q(x^0)\cdot q(x^t|x^{t-1},x^0)} \prod_{k\geq1 ,k\neq t-1} dx^k \bigg] \cdot q(x^{t-1}|x^t,x^0) \cdot log \Big[\frac{q(x^{t-1}|x^t,x^0)}{ p(x^{t-1}|x^t)} dx^{t-1} \Big] \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int \bigg[ \underbrace{ \int q(x^{k:k\geq1,k\neq t-1}|x^{t-1},x^0)}_{q(x^{0;T})=q(x^{t-1},x^0)\cdot q(x^{k:k\geq1,k\neq t-1}|x^{t-1},x^0)} \cdot \underbrace {\frac{q(x^t|x^0)}{ q(x^t|x^{t-1},x^0)}}_{q(x^t,x^0)=q(x^0)\cdot q(x^t|x^0)} \prod_{k\geq1 ,k\neq t-1} dx^k \bigg] \cdot q(x^{t-1}|x^t,x^0) \cdot log \Big[\frac{q(x^{t-1}|x^t,x^0)}{ p(x^{t-1}|x^t)} dx^{t-1} \Big] \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int \bigg[\int q(x^{k:k\geq1,k\neq t-1}|x^{t-1},x^0)\cdot \underbrace {\frac{q(x^t|x^0)}{ q(x^t|x^{t-1},x^0)}}_{=1} \prod_{k\geq1 ,k\neq t-1} dx^k \bigg] \cdot q(x^{t-1}|x^t,x^0) \cdot log \Big[\frac{q(x^{t-1}|x^t,x^0)}{ p(x^{t-1}|x^t)} dx^{t-1} \Big] \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int \bigg[\int q(x^{k:k\geq1,k\neq t-1}|x^{t-1},x^0)\cdot \prod_{k\geq1 ,k\neq t-1} dx^k \bigg] \cdot q(x^{t-1}|x^t,x^0) \cdot log \Big[\frac{q(x^{t-1}|x^t,x^0)}{ p(x^{t-1}|x^t)} dx^{t-1} \Big] \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int \bigg[\underbrace{ \int q(x^{k:k\geq1,k\neq t-1}|x^{t-1},x^0)\cdot \prod_{k\geq1 ,k\neq t-1} dx^k }_{=1}\bigg] \cdot q(x^{t-1}|x^t,x^0) \cdot log \Big[\frac{q(x^{t-1}|x^t,x^0)}{ p(x^{t-1}|x^t)} dx^{t-1} \Big] \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int q(x^{t-1}|x^t,x^0) \cdot log \Big[\frac{q(x^{t-1}|x^t,x^0)}{ p(x^{t-1}|x^t)} dx^{t-1} \Big] \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( E_{x^{t-1}\sim q(x^{t-1}|x^t,x^0)} log \Big[\frac{q(x^{t-1}|x^t,x^0)}{ p(x^{t-1}|x^t)} \Big] \Bigg)\\ &=\sum_{t=2}^{T}KL\bigg(q(x^{t-1}|x^t,x^0)||p(x^{t-1}|x^t) \bigg) \end{split} \end{equation} L2=Ex1,2,Tq(x1,2Tx0)(t=2Tlog[p(xt1xt)q(xt1xt,x0)])=t=2TEx1,2,Tq(x1,2Tx0)(log[p(xt1xt)q(xt1xt,x0)])=t=2T(dx1,2Tq(x1,2Tx0)log[p(xt1xt)q(xt1xt,x0)])=t=2T(dx1,2Tq(xt1xt,x0)q(x1,2Tx0)q(xt1xt,x0)log[p(xt1xt)q(xt1xt,x0)])=t=2T(dx1,2Tq(x0,1,2T)=q(x0)q(x1,2Tx0) q(x0)q(x0,1,2T)q(xt,xt1,x0)=q(xt,x0)q(xt1xt,x0) q(xt,xt1,x0)q(xt,x0)q(xt1xt,x0)log[p(xt1xt)q(xt1xt,x0)])=t=2T(dx1,2Tq(x0)q(x0,1,2T)q(xt1,x0)q(xtxt1,x0)q(xt,x0)q(xt1xt,x0)log[p(xt1xt)q(xt1xt,x0)])=t=2T([q(x0)q(x0,1,2T)q(xt1,x0)q(xtxt1,x0)q(xt,x0)k1,k=t1dxk]q(xt1xt,x0)log[p(xt1xt)q(xt1xt,x0)dxt1])=t=2T([q(xt1,x0)q(x0,1,2T)q(x0)q(xtxt1,x0)q(xt,x0)k1,k=t1dxk]q(xt1xt,x0)log[p(xt1xt)q(xt1xt,x0)dxt1])=t=2T([q(x0;T)=q(xt1,x0)q(xk:k1,k=t1xt1,x0) q(xk:k1,k=t1xt1,x0)q(xt,x0)=q(x0)q(xtx0) q(xtxt1,x0)q(xtx0)k1,k=t1dxk]q(xt1xt,x0)log[p(xt1xt)q(xt1xt,x0)dxt1])=t=2T([q(xk:k1,k=t1xt1,x0)=1 q(xtxt1,x0)q(xtx0)k1,k=t1dxk]q(xt1xt,x0)log[p(xt1xt)q(xt1xt,x0)dxt1])=t=2T([q(xk:k1,k=t1xt1,x0)k1,k=t1dxk]q(xt1xt,x0)log[p(xt1xt)q(xt1xt,x0)dxt1])=t=2T([=1 q(xk:k1,k=t1xt1,x0)k1,k=t1dxk]q(xt1xt,x0)log[p(xt1xt)q(xt1xt,x0)dxt1])=t=2T(q(xt1xt,x0)log[p(xt1xt)q(xt1xt,x0)dxt1])=t=2T(Ext1q(xt1xt,x0)log[p(xt1xt)q(xt1xt,x0)])=t=2TKL(q(xt1xt,x0)∣∣p(xt1xt))

因此

L : = L 1 + L 2 + L 3 = K L ( q ( x T ∣ x 0 ) ∣ ∣ p ( x T ) ) + ∑ t = 2 T K L ( q ( x t − 1 ∣ x t , x 0 ) ∣ ∣ p ( x t − 1 ∣ x t ) ) − l o g [ p ( x 0 ∣ x 1 ) ] \begin{equation} \begin{split} L&:=L_1+L_2+L_3 \\ &=KL\Big(q(x^T|x^0)||p(x^T)\Big) + \sum_{t=2}^{T}KL\bigg(q(x^{t-1}|x^t,x^0)||p(x^{t-1}|x^t) \bigg)-log \Big[p(x^{0}|x^1)\Big] \end{split} \end{equation} L:=L1+L2+L3=KL(q(xTx0)∣∣p(xT))+t=2TKL(q(xt1xt,x0)∣∣p(xt1xt))log[p(x0x1)]


网站公告

今日签到

点亮在社区的每一天
去签到