0、快速访问
论文阅读笔记:Denoising Diffusion Implicit Models (1)
论文阅读笔记:Denoising Diffusion Implicit Models (2)
论文阅读笔记:Denoising Diffusion Implicit Models (3)
论文阅读笔记:Denoising Diffusion Implicit Models (4)
4、DDPM与DDIM的相同点与不同点
4.1、 相同点
DDPM与DDIM的训练过程相同,因此DDPM训练的模型可以直接在DDIM当中使用,训练过程下所示
4.2、不同点
DDPM与DDIM在推理阶段是不同的。
DDPM在推理阶段的采样过程如下图所示。首先模型 ϵ θ \epsilon_\theta ϵθ预测出 x 0 → x t x_0\to x_t x0→xt所添加的噪音 ϵ t \epsilon_t ϵt,然后根据公式 ( x t − 1 − α t 1 − α ˉ t ⋅ ϵ t ) \Big(x_t-\frac{1-\alpha_t}{\sqrt{1-\bar{\alpha}_{t}}}\cdot \epsilon_t\Big) (xt−1−αˉt1−αt⋅ϵt)得到 x t − 1 x_{t-1} xt−1分布的均值,最后在均值上添加对应的噪音,得到 x t − 1 x_{t-1} xt−1
接下来介绍DDIM的采样过程。根据上文论文阅读笔记:Denoising Diffusion Implicit Models (2)中公式(2)所示的前向加噪过程:在给定 x 0 x_0 x0和 x t x_t xt的条件下, x t − 1 x_{t-1} xt−1的分布 q σ ( x t − 1 ∣ x t , x 0 ) = N ( x t − 1 ∣ 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 , σ t 2 I ) q_{\sigma}(x_{t-1}|x_t,x_0)=N\Bigg(x_{t-1}|\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0 ,\sigma_t^2 I\Bigg) qσ(xt−1∣xt,x0)=N(xt−1∣1−αt1−αt−1−σt2⋅xt+[αt−1−1−αtαt⋅(1−αt−1−σt2)]⋅x0,σt2I),也就是 x t − 1 x_{t-1} xt−1的计算过程如公式(1)所示。
x t − 1 = α t − 1 ⋅ x 0 + 1 − α t − 1 − σ t 2 ⋅ x t − α t x 0 1 − α t + σ t 2 ϵ t ⏟ 标准高斯分布 = α t − 1 ⋅ x t − 1 − α t ⋅ z t α t + 1 − α t − 1 − σ t 2 ⋅ 1 1 − α t ⋅ ( x t − α t ⋅ ( x t − 1 − α t ⋅ z t α t ) ) + σ t 2 ⋅ ϵ t = α t − 1 ⋅ x t − 1 − α t ⋅ z t α t + 1 − α t − 1 − σ t 2 ⋅ 1 1 − α t ⋅ ( x t − x t + 1 − α t ⋅ z t ) + σ t 2 ⋅ ϵ t = α t − 1 ⋅ x t − 1 − α t ⋅ z t α t ⏟ = x 0 + 1 − α t − 1 − σ t 2 ⋅ z t + σ t 2 ⋅ ϵ t \begin{equation} \begin{split} x_{t-1}&= \sqrt{\alpha_{t-1}}\cdot x_0+\sqrt{1-\alpha_{t-1}-\sigma_t^2}\cdot \frac{x_t-\sqrt{\alpha}_t x_0}{\sqrt{1-\alpha_t}} + \sigma_t^2 \underbrace{\epsilon_t}_{标准高斯分布} \\ &=\sqrt{\alpha_{t-1}}\cdot \frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}} + \sqrt{1-\alpha_{t-1}-\sigma_t^2}\cdot \frac{1}{\sqrt{1-\alpha_t}}\cdot \bigg(x_t- \bcancel{\sqrt {\alpha_t}}\cdot \big(\frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\bcancel{\sqrt{\alpha_t}}} \big) \bigg) + \sigma_t^2 \cdot\epsilon_t\\ &=\sqrt{\alpha_{t-1}}\cdot \frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}+ \sqrt{1-\alpha_{t-1}-\sigma_t^2}\cdot \frac{1}{\sqrt{1-\alpha_t}}\cdot (x_t - x_t + \sqrt{1-\alpha_t}\cdot z_t)+ \sigma_t^2 \cdot\epsilon_t\\ &=\sqrt{\alpha_{t-1}}\cdot \underbrace{ \frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}}_{=x_0}+ \sqrt{1-\alpha_{t-1}-\sigma_t^2}\cdot z_t + \sigma_t^2 \cdot\epsilon_t \end{split} \end{equation} xt−1=αt−1⋅x0+1−αt−1−σt2⋅1−αtxt−αtx0+σt2标准高斯分布
ϵt=αt−1⋅αtxt−1−αt⋅zt+1−αt−1−σt2⋅1−αt1⋅(xt−αt
⋅(αt
xt−1−αt⋅zt))+σt2⋅ϵt=αt−1⋅αtxt−1−αt⋅zt+1−αt−1−σt2⋅1−αt1⋅(xt−xt+1−αt⋅zt)+σt2⋅ϵt=αt−1⋅=x0
αtxt−1−αt⋅zt+1−αt−1−σt2⋅zt+σt2⋅ϵt
得到的公式(1)就是在推断时跳 1 1 1步的采样过程。
由前向加噪过程,可以推知 q σ ( x t − 2 ∣ x t − 1 , x 0 ) = N ( x t − 2 ∣ 1 − α t − 2 − σ t − 1 2 1 − α t − 1 ⋅ x t − 1 + [ α t − 2 − α t − 1 ⋅ ( 1 − α t − 2 − σ t − 1 2 ) 1 − α t − 1 ] ⋅ x 0 , σ t − 1 2 I ) q_{\sigma}(x_{t-2}|x_{t-1},x_0)=N\bigg(x_{t-2}|\sqrt{\frac{1-\alpha_{t-2}-\sigma_{t-1}^2}{1-\alpha_{t-1}}}\cdot x_{t-1}+ \bigg[\sqrt{\alpha_{t-2}}- \frac{\sqrt{ \alpha_{t-1}\cdot (1-\alpha_{t-2}-\sigma_{t-1}^2} )}{\sqrt{1-\alpha_{t-1}}} \bigg] \cdot x_0 ,\sigma_{t-1}^2 I\bigg) qσ(xt−2∣xt−1,x0)=N(xt−2∣1−αt−11−αt−2−σt−12⋅xt−1+[αt−2−1−αt−1αt−1⋅(1−αt−2−σt−12)]⋅x0,σt−12I)。接下来考虑,跳 2 2 2步时的采样过程,即在给定 x 0 x_0 x0和 x t x_t xt时, x t − 2 x_{t-2} xt−2时的采样过程,即 q σ ( x t − 2 ∣ x 0 , x t ) q_\sigma(x_{t-2}|x_0,x_t) qσ(xt−2∣x0,xt)的分布。
首先,我们可以确定 q σ ( x t − 2 ∣ x 0 , x t ) q_\sigma(x_{t-2}|x_0,x_t) qσ(xt−2∣x0,xt)是高斯分布,假设其均值和方差分别为 μ t − 2 \mu_{t-2} μt−2和 σ t − 2 2 \sigma_{t-2}^2 σt−22。由于 q σ ( x t − 2 ∣ x 0 , x t ) q_\sigma(x_{t-2}|x_0,x_t) qσ(xt−2∣x0,xt)是 q σ ( x t − 2 , x t − 1 ∣ x 0 , x t ) q_\sigma(x_{t-2},x_{t-1}|x_0,x_t) qσ(xt−2,xt−1∣x0,xt) 的边缘分布。
q σ ( x t − 2 ∣ x 0 , x t ) = ∫ q σ ( x t − 2 , x t − 1 ∣ x 0 , x t ) ⋅ d x t − 1 = ∫ q σ ( x t − 2 ∣ x 0 , x t − 1 ) ⋅ q σ ( x t − 1 ∣ x 0 , x t ) ⋅ d x t − 1 \begin{equation} \begin{split} q_\sigma(x_{t-2}|x_0,x_t)&= \int q_\sigma(x_{t-2},x_{t-1}|x_0,x_t) \cdot dx_{t-1} \\ &=\int q_\sigma(x_{t-2}|x_0,x_{t-1}) \cdot q_\sigma(x_{t-1}|x_0,x_{t}) \cdot dx_{t-1} \end{split} \end{equation} qσ(xt−2∣x0,xt)=∫qσ(xt−2,xt−1∣x0,xt)⋅dxt−1=∫qσ(xt−2∣x0,xt−1)⋅qσ(xt−1∣x0,xt)⋅dxt−1
因此
μ t − 2 = E ( q σ ( x t − 2 ∣ x 0 , x t ) ) = ∫ x t − 2 ⋅ q σ ( x t − 2 ∣ x 0 , x t ) ⋅ d x t − 2 = ∫ x t − 2 ⋅ ( ∫ q σ ( x t − 2 , ∣ x 0 , x t − 1 ) ⋅ q σ ( x t − 1 ∣ x 0 , x t ) ⋅ d x t − 1 ) ⋅ d x t − 2 = ∫ ∫ x t − 2 ⋅ q σ ( x t − 2 , ∣ x 0 , x t − 1 ) ⋅ q σ ( x t − 1 ∣ x 0 , x t ) ⋅ d x t − 1 ⋅ d x t − 2 = ∫ ( ∫ x t − 2 ⋅ q σ ( x t − 2 ∣ x 0 , x t − 1 ) ⋅ d x t − 2 ) ⋅ q σ ( x t − 1 ∣ x 0 , x t ) ⋅ d x t − 1 = ∫ ( E ( q σ ( x t − 2 , ∣ x 0 , x t − 1 ) ) ⋅ q σ ( x t − 1 ∣ x 0 , x t ) ⋅ d x t − 1 = ∫ ( 1 − α t − 2 − σ t − 1 2 1 − α t − 1 ⋅ x t − 1 + [ α t − 2 − α t − 1 ⋅ ( 1 − α t − 2 − σ t − 1 2 ) 1 − α t − 1 ] ⋅ x 0 ) ⋅ q σ ( x t − 1 ∣ x 0 , x t ) ⋅ d x t − 1 = ∫ ( 1 − α t − 2 − σ t − 1 2 1 − α t − 1 ⋅ x t − 1 ) ⋅ q σ ( x t − 1 ∣ x 0 , x t ) ⋅ d x t − 1 + ∫ ( α t − 2 − α t − 1 ⋅ ( 1 − α t − 2 − σ t − 1 2 ) 1 − α t − 1 ) ⋅ x 0 ⋅ q σ ( x t − 1 ∣ x 0 , x t ) ⋅ d x t − 1 = 1 − α t − 2 − σ t − 1 2 1 − α t − 1 ⋅ ∫ x t − 1 ⋅ q σ ( x t − 1 ∣ x 0 , x t ) ⋅ d x t − 1 + ( α t − 2 − α t − 1 ⋅ ( 1 − α t − 2 − σ t − 1 2 ) 1 − α t − 1 ) ⋅ x 0 ∫ q σ ( x t − 1 ∣ x 0 , x t ) ⋅ d x t − 1 ⏟ = 1 = 1 − α t − 2 − σ t − 1 2 1 − α t − 1 ⋅ E ( q σ ( x t − 1 ∣ x 0 , x t ) ) + ( α t − 2 − α t − 1 ⋅ ( 1 − α t − 2 − σ t − 1 2 ) 1 − α t − 1 ) ⋅ x 0 = 1 − α t − 2 − σ t − 1 2 1 − α t − 1 ⋅ ( 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 ) + ( α t − 2 − α t − 1 ⋅ ( 1 − α t − 2 − σ t − 1 2 ) 1 − α t − 1 ) ⋅ x 0 = 1 − α t − 2 − σ t − 1 2 1 − α t − 1 ⋅ 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + 1 − α t − 2 − σ t − 1 2 1 − α t − 1 ⋅ α t − 1 ⋅ x 0 − 1 − α t − 2 − σ t − 1 2 1 − α t − 1 ⋅ α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ⋅ x 0 + α t − 2 ⋅ x 0 − α t − 1 ⋅ ( 1 − α t − 2 − σ t − 1 2 ) 1 − α t − 1 ⋅ x 0 = 1 − α t − 2 − σ t − 1 2 1 − α t − 1 ⋅ 1 − α t − 1 − σ t 2 1 − α t ⋅ x t − 1 − α t − 2 − σ t − 1 2 1 − α t − 1 ⋅ α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ⋅ x 0 + α t − 2 ⋅ x 0 ⏟ x 0 = x t − 1 − α t ⋅ z t α t = 1 − α t − 2 − σ t − 1 2 1 − α t − 1 ⋅ 1 − α t − 1 − σ t 2 1 − α t ⋅ x t − 1 − α t − 2 − σ t − 1 2 1 − α t − 1 ⋅ α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ⋅ x t − 1 − α t ⋅ z t α t + α t − 2 ⋅ x t − 1 − α t ⋅ z t α t = 1 − α t − 2 − σ t − 1 2 1 − α t − 1 ⋅ 1 − α t − 1 − σ t 2 1 − α t ⋅ x t − 1 − α t − 2 − σ t − 1 2 1 − α t − 1 ⋅ α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ⋅ x t α t + 1 − α t − 2 − σ t − 1 2 1 − α t − 1 ⋅ α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ⋅ 1 − α t ⋅ z t α t + α t − 2 ⋅ x t − 1 − α t ⋅ z t α t = 1 − α t − 2 − σ t − 1 2 1 − α t − 1 ⋅ α t ⋅ ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ⋅ 1 − α t ⋅ z t α t + α t − 2 ⋅ x t − 1 − α t ⋅ z t α t = α t − 2 ⋅ x t − 1 − α t ⋅ z t α t ⏟ = x 0 + 1 − α t − 2 − σ t − 1 2 1 − α t − 1 ⋅ 1 − α t − 1 − σ t 2 ⋅ z t \begin{equation} \begin{split} \mu_{t-2}&=E\big(q_\sigma(x_{t-2}|x_0,x_t)\big) \\ &=\int x_{t-2} \cdot q_\sigma(x_{t-2}|x_0,x_t)\cdot dx_{t-2} \\ &=\int x_{t-2} \cdot \bigg(\int q_\sigma(x_{t-2},|x_0,x_{t-1}) \cdot q_\sigma(x_{t-1}|x_0,x_{t}) \cdot dx_{t-1} \bigg) \cdot dx_{t-2} \\ &=\int \int x_{t-2} \cdot q_\sigma(x_{t-2},|x_0,x_{t-1}) \cdot q_\sigma(x_{t-1}|x_0,x_{t}) \cdot dx_{t-1} \cdot dx_{t-2} \\ &=\int \bigg( \int x_{t-2} \cdot q_\sigma(x_{t-2}|x_0,x_{t-1}) \cdot dx_{t-2} \bigg)\cdot q_\sigma(x_{t-1}|x_0,x_{t}) \cdot dx_{t-1} \\ &=\int \bigg(E(q_\sigma(x_{t-2},|x_0,x_{t-1}) \bigg)\cdot q_\sigma(x_{t-1}|x_0,x_{t}) \cdot dx_{t-1} \\ &=\int \bigg(\sqrt{\frac{1-\alpha_{t-2}-\sigma_{t-1}^2}{1-\alpha_{t-1}}}\cdot x_{t-1}+ \bigg[\sqrt{\alpha_{t-2}}- \frac{\sqrt{ \alpha_{t-1}\cdot (1-\alpha_{t-2}-\sigma_{t-1}^2} )}{\sqrt{1-\alpha_{t-1}}} \bigg] \cdot x_0 \bigg)\cdot q_\sigma(x_{t-1}|x_0,x_{t}) \cdot dx_{t-1} \\ &=\int \bigg(\sqrt{\frac{1-\alpha_{t-2}-\sigma_{t-1}^2}{1-\alpha_{t-1}}}\cdot x_{t-1} \bigg)\cdot q_\sigma(x_{t-1}|x_0,x_{t}) \cdot dx_{t-1} + \int \bigg(\sqrt{\alpha_{t-2}}- \frac{\sqrt{ \alpha_{t-1}\cdot (1-\alpha_{t-2}-\sigma_{t-1}^2} )}{\sqrt{1-\alpha_{t-1}}} \bigg) \cdot x_0 \cdot q_\sigma(x_{t-1}|x_0,x_{t}) \cdot dx_{t-1}\\ &=\sqrt{\frac{1-\alpha_{t-2}-\sigma_{t-1}^2}{1-\alpha_{t-1}}}\cdot \int x_{t-1}\cdot q_\sigma(x_{t-1}|x_0,x_{t}) \cdot dx_{t-1} +\bigg(\sqrt{\alpha_{t-2}}- \frac{\sqrt{ \alpha_{t-1}\cdot (1-\alpha_{t-2}-\sigma_{t-1}^2} )}{\sqrt{1-\alpha_{t-1}}} \bigg) \cdot x_0 \underbrace{ \int q_\sigma(x_{t-1}|x_0,x_{t}) \cdot dx_{t-1}}_{=1}\\ &=\sqrt{\frac{1-\alpha_{t-2}-\sigma_{t-1}^2}{1-\alpha_{t-1}}}\cdot E\bigg(q_\sigma(x_{t-1}|x_0,x_{t})\bigg) +\bigg(\sqrt{\alpha_{t-2}}- \frac{\sqrt{ \alpha_{t-1}\cdot (1-\alpha_{t-2}-\sigma_{t-1}^2} )}{\sqrt{1-\alpha_{t-1}}} \bigg) \cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-2}-\sigma_{t-1}^2}{1-\alpha_{t-1}}}\cdot \bigg(\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0 \bigg) +\bigg(\sqrt{\alpha_{t-2}}- \frac{\sqrt{ \alpha_{t-1}\cdot (1-\alpha_{t-2}-\sigma_{t-1}^2} )}{\sqrt{1-\alpha_{t-1}}} \bigg) \cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-2}-\sigma_{t-1}^2}{1-\alpha_{t-1}}}\cdot \sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t + \bcancel{\sqrt{\frac{1-\alpha_{t-2}-\sigma_{t-1}^2}{1-\alpha_{t-1}}} \cdot \sqrt{\alpha_{t-1}}\cdot x_0} -\sqrt{\frac{1-\alpha_{t-2}-\sigma_{t-1}^2}{1-\alpha_{t-1}}}\cdot \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \cdot x_0 + \sqrt{\alpha_{t-2}} \cdot x_0 - \bcancel{\frac{\sqrt{ \alpha_{t-1}\cdot (1-\alpha_{t-2}-\sigma_{t-1}^2} )}{\sqrt{1-\alpha_{t-1}}} \cdot x_0}\\ &=\sqrt{\frac{1-\alpha_{t-2}-\sigma_{t-1}^2}{1-\alpha_{t-1}}}\cdot \sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t -\sqrt{\frac{1-\alpha_{t-2}-\sigma_{t-1}^2}{1-\alpha_{t-1}}}\cdot \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \cdot x_0 + \sqrt{\alpha_{t-2}} \cdot \underbrace{x_0}_{x_0=\frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}} \\ &=\sqrt{\frac{1-\alpha_{t-2}-\sigma_{t-1}^2}{1-\alpha_{t-1}}}\cdot \sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t -\sqrt{\frac{1-\alpha_{t-2}-\sigma_{t-1}^2}{1-\alpha_{t-1}}}\cdot \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \cdot \frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}} + \sqrt{\alpha_{t-2}} \cdot \frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}} \\ &=\bcancel{\sqrt{\frac{1-\alpha_{t-2}-\sigma_{t-1}^2}{1-\alpha_{t-1}}}\cdot \sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t} -\bcancel{\sqrt{\frac{1-\alpha_{t-2}-\sigma_{t-1}^2}{1-\alpha_{t-1}}}\cdot \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \cdot \frac{x_t}{\sqrt{\alpha_t}}} +\sqrt{\frac{1-\alpha_{t-2}-\sigma_{t-1}^2}{1-\alpha_{t-1}}}\cdot \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \cdot \frac{{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}} + \sqrt{\alpha_{t-2}} \cdot \frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}} \\ &=\sqrt{\frac{1-\alpha_{t-2}-\sigma_{t-1}^2}{1-\alpha_{t-1}}}\cdot \frac{\bcancel{\sqrt{\alpha_t}}\cdot \sqrt{ \cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\bcancel{\sqrt{1-\alpha_t}}} \cdot \frac{{\bcancel{\sqrt{1-\alpha_t}}\cdot z_t}}{\bcancel{\sqrt{\alpha_t}}} + \sqrt{\alpha_{t-2}} \cdot \frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}} \\ &=\sqrt{\alpha_{t-2}} \cdot \underbrace{ \frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}}_{=x_0}+ \sqrt{\frac{1-\alpha_{t-2}-\sigma_{t-1}^2}{1-\alpha_{t-1}}}\cdot \sqrt{ 1-\alpha_{t-1}-\sigma_t^2} \cdot z_t \\ \end{split} \end{equation} μt−2=E(qσ(xt−2∣x0,xt))=∫xt−2⋅qσ(xt−2∣x0,xt)⋅dxt−2=∫xt−2⋅(∫qσ(xt−2,∣x0,xt−1)⋅qσ(xt−1∣x0,xt)⋅dxt−1)⋅dxt−2=∫∫xt−2⋅qσ(xt−2,∣x0,xt−1)⋅qσ(xt−1∣x0,xt)⋅dxt−1⋅dxt−2=∫(∫xt−2⋅qσ(xt−2∣x0,xt−1)⋅dxt−2)⋅qσ(xt−1∣x0,xt)⋅dxt−1=∫(E(qσ(xt−2,∣x0,xt−1))⋅qσ(xt−1∣x0,xt)⋅dxt−1=∫(1−αt−11−αt−2−σt−12⋅xt−1+[αt−2−1−αt−1αt−1⋅(1−αt−2−σt−12)]⋅x0)⋅qσ(xt−1∣x0,xt)⋅dxt−1=∫(1−αt−11−αt−2−σt−12⋅xt−1)⋅qσ(xt−1∣x0,xt)⋅dxt−1+∫(αt−2−1−αt−1αt−1⋅(1−αt−2−σt−12))⋅x0⋅qσ(xt−1∣x0,xt)⋅dxt−1=1−αt−11−αt−2−σt−12⋅∫xt−1⋅qσ(xt−1∣x0,xt)⋅dxt−1+(αt−2−1−αt−1αt−1⋅(1−αt−2−σt−12))⋅x0=1
∫qσ(xt−1∣x0,xt)⋅dxt−1=1−αt−11−αt−2−σt−12⋅E(qσ(xt−1∣x0,xt))+(αt−2−1−αt−1αt−1⋅(1−αt−2−σt−12))⋅x0=1−αt−11−αt−2−σt−12⋅(1−αt1−αt−1−σt2⋅xt+[αt−1−1−αtαt⋅(1−αt−1−σt2)]⋅x0)+(αt−2−1−αt−1αt−1⋅(1−αt−2−σt−12))⋅x0=1−αt−11−αt−2−σt−12⋅1−αt1−αt−1−σt2⋅xt+1−αt−11−αt−2−σt−12⋅αt−1⋅x0
−1−αt−11−αt−2−σt−12⋅1−αtαt⋅(1−αt−1−σt2)⋅x0+αt−2⋅x0−1−αt−1αt−1⋅(1−αt−2−σt−12)⋅x0
=1−αt−11−αt−2−σt−12⋅1−αt1−αt−1−σt2⋅xt−1−αt−11−αt−2−σt−12⋅1−αtαt⋅(1−αt−1−σt2)⋅x0+αt−2⋅x0=αtxt−1−αt⋅zt
x0=1−αt−11−αt−2−σt−12⋅1−αt1−αt−1−σt2⋅xt−1−αt−11−αt−2−σt−12⋅1−αtαt⋅(1−αt−1−σt2)⋅αtxt−1−αt⋅zt+αt−2⋅αtxt−1−αt⋅zt=1−αt−11−αt−2−σt−12⋅1−αt1−αt−1−σt2⋅xt
−1−αt−11−αt−2−σt−12⋅1−αtαt⋅(1−αt−1−σt2)⋅αtxt
+1−αt−11−αt−2−σt−12⋅1−αtαt⋅(1−αt−1−σt2)⋅αt1−αt⋅zt+αt−2⋅αtxt−1−αt⋅zt=1−αt−11−αt−2−σt−12⋅1−αt
αt
⋅⋅(1−αt−1−σt2)⋅αt
1−αt
⋅zt+αt−2⋅αtxt−1−αt⋅zt=αt−2⋅=x0
αtxt−1−αt⋅zt+1−αt−11−αt−2−σt−12⋅1−αt−1−σt2⋅zt
结果貌似与论文中的(本文中公式4)略有不同…论文和代码中使用的跳 n n n步的采样过程如公式(4)所示。
x t − n = α t − n ⋅ x t − 1 − α t ⋅ z t α t ⏟ 预测出 z t , 进而计算出 x 0 + 1 − α t − n − σ t 2 ⋅ z t + σ t 2 ϵ t ⏟ 标准高斯分布 \begin{equation} \begin{split} x_{t-n}&=\sqrt{\alpha_{t-n}}\cdot \underbrace{\frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}}_{预测出z_t,进而计算出x_0}+\sqrt{1-\alpha_{t-n}-\sigma_t^2}\cdot z_t + \sigma_t^2 \underbrace{ \epsilon_t}_{标准高斯分布} \\ \end{split} \end{equation} xt−n=αt−n⋅预测出zt,进而计算出x0
αtxt−1−αt⋅zt+1−αt−n−σt2⋅zt+σt2标准高斯分布
ϵt