1、参考来源
论文《Denoising Diffusion Implicit Models》
来源:ICLR2021
https://iclr.cc/virtual/2021/poster/2804
论文链接:https://arxiv.org/abs/2010.02502
代码链接:https://github.com/ermongroup/ddim
2、符号表示的不同
在论文DDPM《Denoising Diffusion Implicit Models》当中,前向传播过程的 q ( x t − 1 ∣ x t , x 0 ) ∼ N ( x t − 1 ; μ ~ t ( x t , x 0 ) , σ t ) q(x_{t-1}|x_t,x_0)\sim N\big(x_{t-1};\widetilde{\mu}_t(x_t,x_0),\sigma_t\big) q(xt−1∣xt,x0)∼N(xt−1;μ
t(xt,x0),σt)。并且 μ ~ t ( x t , x 0 ) 和 σ t \widetilde{\mu}_t(x_t,x_0)和\sigma_t μ
t(xt,x0)和σt分别如公式(1)所示。
σ t = β t ⋅ ( 1 − α t − 1 ˉ ) ( 1 − α t ˉ ) μ ~ t ( x t , x 0 ) = α t ⋅ ( 1 − α t − 1 ˉ ) 1 − α t ˉ ⋅ x t + β t ⋅ α t − 1 ˉ 1 − α t ˉ ⋅ x 0 \begin{equation} \begin{split} \sigma_t&=\sqrt{\frac{\beta_t\cdot (1-\bar{\alpha_{t-1}})}{(1-\bar{\alpha_{t}})}}\\ \widetilde{\mu}_t(x_t,x_0)&=\frac{\sqrt{\alpha_t}\cdot(1-\bar{\alpha_{t-1}})}{1-\bar{\alpha_t}}\cdot x_t+\frac{\beta_t\cdot \sqrt{\bar{\alpha_{t-1}}}}{1-\bar{\alpha_t}} \cdot x_0 \\ \end{split} \end{equation} σtμ
t(xt,x0)=(1−αtˉ)βt⋅(1−αt−1ˉ)=1−αtˉαt⋅(1−αt−1ˉ)⋅xt+1−αtˉβt⋅αt−1ˉ⋅x0
在DDIM《Denoising Diffusion Implicit Models》中对符号进行了重新定义。具体来说使用 α t \alpha_t αt替换掉了 α ˉ t \bar\alpha_t αˉt,而在DDPM当中
α ˉ t = ∏ 0 t α i \begin{equation} \begin{split} \bar \alpha_t=\prod_{0}^{t}\alpha_i \end{split} \end{equation} αˉt=0∏tαi
因此,在DDIM中会发生一些变化,例如 β t \beta_t βt的改变如公式(3)所示。
β t = 1 − α t ( D D P M ) = 1 − α t α t − 1 ( D D I M ) \begin{equation} \begin{split} \beta_t&=1-\alpha_t (DDPM)\\ &=1-\frac{\alpha_t}{\alpha_{t-1}} (DDIM)\\ \end{split} \end{equation} βt=1−αt(DDPM)=1−αt−1αt(DDIM)
前向加噪过程中的 q ( x t − 1 ∣ x t , x 0 ) q(x_{t-1}|x_t,x_0) q(xt−1∣xt,x0)分布的方差和均值分别如公式(4)和(5)所示。
σ t 2 = 1 − α ˉ t − 1 1 − α t ˉ ⋅ β t ( D D P M ) = 1 − α t − 1 1 − α t ⋅ ( 1 − α t α t − 1 ) ( D D I M ) \begin{equation} \begin{split} \sigma_t^2&=\frac{1-\bar{\alpha}_{t-1}}{1-\bar{\alpha_t}}\cdot \beta_t(DDPM)\\ &=\frac{1-\alpha_{t-1}}{1-\alpha_t}\cdot (1-\frac{\alpha_t}{\alpha_{t-1}}) (DDIM) \end{split} \end{equation} σt2=1−αtˉ1−αˉt−1⋅βt(DDPM)=1−αt1−αt−1⋅(1−αt−1αt)(DDIM)
μ ~ t ( x t , x 0 ) = α t ⋅ ( 1 − α ˉ t − 1 ) 1 − α t ˉ ⋅ x t + β t ⋅ α ˉ t − 1 1 − α t ˉ ⋅ x 0 ( D D P M ) = α t ⋅ ( 1 − α t − 1 ) α t − 1 ⋅ ( 1 − α t ) ⋅ x t + ( 1 − α t α t − 1 ) ⋅ α t − 1 1 − α t ⋅ x 0 ( D D I M ) = α t ⋅ ( 1 − α t − 1 ) 2 α t − 1 ⋅ ( 1 − α t ) 2 ⋅ x t + α t − 1 − α t α t − 1 ⋅ α t − 1 1 − α t ⋅ x 0 = 1 − α t − 1 1 − α t ⋅ α t − α t ⋅ α t − 1 α t − 1 − α t − 1 ⋅ α t ⋅ x t + α t − 1 − α t α t − 1 ⋅ ( 1 − α t ) ⋅ x 0 = 1 − α t − 1 1 − α t ⋅ α t + α t − 1 − α t − 1 − α t ⋅ α t − 1 α t − 1 − α t − 1 ⋅ α t ⋅ x t + α t − 1 − α t ⋅ α t − 1 + α t ⋅ α t − 1 − α t α t − 1 ⋅ ( 1 − α t ) ⋅ x 0 = 1 − α t − 1 1 − α t ⋅ ( 1 + α t − α t − 1 α t − 1 − α t − 1 ⋅ α t ) ⋅ x t + α t − 1 ⋅ ( 1 − α t ) − α t ⋅ ( 1 − α t − 1 ) α t − 1 ⋅ ( 1 − α t ) ⋅ x 0 = 1 − α t − 1 1 − α t ⋅ ( 1 − α t − 1 − α t α t − 1 − α t − 1 ⋅ α t ) ⋅ x t + [ α t − 1 − α t ⋅ ( 1 − α t − 1 ) α t − 1 ⋅ ( 1 − α t ) ] ⋅ x 0 = 1 1 − α t ⋅ ( 1 − α t − 1 − ( α t − 1 − α t ) ⋅ ( 1 − α t − 1 ) α t − 1 − α t − 1 ⋅ α t ) ⋅ x t + [ α t − 1 − α t 2 ⋅ ( 1 − α t − 1 ) 2 α t − 1 ⋅ ( 1 − α t ) ] ⋅ x 0 = 1 1 − α t ⋅ ( 1 − α t − 1 − ( α t − 1 − α t ) ⋅ ( 1 − α t − 1 ) α t − 1 ⋅ ( 1 − α t ) ⏟ = σ t 2 ) ⋅ x t + [ α t − 1 − α t ⋅ ( 1 − α t − 1 ) ⋅ ( α t − α t ⋅ α t − 1 ) α t − 1 ⋅ ( 1 − α t ) ] ⋅ x 0 = 1 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) ⋅ x t + [ α t − 1 − α t ⋅ ( 1 − α t − 1 ) ⋅ ( α t + α t − 1 − α t − 1 − α t ⋅ α t − 1 ) α t − 1 ⋅ ( 1 − α t ) ⋅ ( 1 − α t ) ] ⋅ x 0 = 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + [ α t − 1 − 1 − α t − 1 1 − α t ⋅ α t ⋅ ( α t − α t − 1 + α t − 1 ⋅ ( 1 − α t ) ) α t − 1 ⋅ ( 1 − α t ) ] ⋅ x 0 = 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + [ α t − 1 − 1 − α t − 1 1 − α t ⋅ α t ⋅ ( α t − α t − 1 + α t − 1 ⋅ ( 1 − α t ) ) α t − 1 ⋅ ( 1 − α t ) ] ⋅ x 0 = 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + [ α t − 1 − 1 − α t − 1 1 − α t ⋅ α t ⋅ ( 1 + α t − α t − 1 α t − 1 ⋅ ( 1 − α t ) ) ] ⋅ x 0 = 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + [ α t − 1 − 1 1 − α t ⋅ ( 1 − α t − 1 ) ⋅ α t ⋅ ( 1 − α t − 1 − α t α t − 1 ⋅ ( 1 − α t ) ) ] ⋅ x 0 = 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + [ α t − 1 − 1 1 − α t ⋅ α t ⋅ ( 1 − α t − 1 − ( α t − 1 − α t ) ⋅ ( 1 − α t − 1 ) α t − 1 ⋅ ( 1 − α t ) ⏟ σ t 2 ) ] ⋅ x 0 = 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + [ α t − 1 − 1 1 − α t ⋅ α t ⋅ ( 1 − α t − 1 − σ t 2 ) ] ⋅ x 0 = 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 \begin{equation} \begin{split} \widetilde{\mu}_t(x_t,x_0)&=\frac{\sqrt{\alpha_t}\cdot(1-\bar\alpha_{t-1})}{1-\bar{\alpha_t}}\cdot x_t+\frac{\beta_t\cdot \sqrt{\bar\alpha_{t-1}}}{1-\bar{\alpha_t}} \cdot x_0 (DDPM)\\ &=\frac{\sqrt{\alpha_t}\cdot(1-\alpha_{t-1})}{\sqrt{\alpha_{t-1}}\cdot(1-\alpha_t)}\cdot x_t+(1-\frac{\alpha_t}{\alpha_{t-1}})\cdot\frac{\sqrt{\alpha_{t-1}}}{1-\alpha_t}\cdot x_0 (DDIM)\\ &= \sqrt{\frac{\alpha_t\cdot (1-\alpha_{t-1})^2}{\alpha_{t-1} \cdot (1-\alpha_t)^2}}\cdot x_t+\frac{\alpha_{t-1}-\alpha_t}{\alpha_{t-1}}\cdot\frac{\sqrt{\alpha_{t-1}}}{1-\alpha_t}\cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}}{1-\alpha_{t}}\cdot \frac{\alpha_t-\alpha_t \cdot \alpha_{t-1}}{\alpha_{t-1}-\alpha_{t-1}\cdot \alpha_{t}}} \cdot x_t+\frac{\alpha_{t-1}-\alpha_t}{\sqrt{ \alpha_{t-1}}\cdot (1-\alpha_t)}\cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}}{1-\alpha_{t}}\cdot \frac{\alpha_t+\alpha_{t-1}-\alpha_{t-1}-\alpha_t \cdot \alpha_{t-1}}{\alpha_{t-1}-\alpha_{t-1}\cdot \alpha_{t}}} \cdot x_t+\frac{\alpha_{t-1}-\alpha_t\cdot \alpha_{t-1}+\alpha_t\cdot \alpha_{t-1}-\alpha_t}{\sqrt{ \alpha_{t-1}}\cdot (1-\alpha_t)}\cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}}{1-\alpha_{t}}\cdot \Big(1+\frac{\alpha_t-\alpha_{t-1}}{\alpha_{t-1}-\alpha_{t-1}\cdot \alpha_{t}}\Big)}\cdot x_t+\frac{\alpha_{t-1}\cdot (1-\alpha_t)-\alpha_t\cdot (1-\alpha_{t-1})}{\sqrt{ \alpha_{t-1}}\cdot (1-\alpha_t)}\cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}}{1-\alpha_{t}}\cdot \Big(1-\frac{\alpha_{t-1}-\alpha_t}{\alpha_{t-1}-\alpha_{t-1}\cdot \alpha_{t}}\Big)}\cdot x_t+ \bigg[ \sqrt{\alpha_{t-1}}-\frac{\alpha_t\cdot (1-\alpha_{t-1})}{\sqrt{ \alpha_{t-1}}\cdot (1-\alpha_t)} \bigg] \cdot x_0 \\ &=\sqrt{\frac{1}{1-\alpha_{t}}\cdot \Big(1-\alpha_{t-1}-\frac{(\alpha_{t-1}-\alpha_t)\cdot (1-\alpha_{t-1})}{\alpha_{t-1}-\alpha_{t-1}\cdot \alpha_{t}}\Big)}\cdot x_t+ \bigg[ \sqrt{\alpha_{t-1}}-\frac{\sqrt{\alpha_t^2\cdot (1-\alpha_{t-1})^2}}{\sqrt{ \alpha_{t-1}}\cdot (1-\alpha_t)} \bigg] \cdot x_0 \\ &=\sqrt{\frac{1}{1-\alpha_{t}}\cdot \Big(1-\alpha_{t-1}-\underbrace{\frac{(\alpha_{t-1}-\alpha_t)\cdot (1-\alpha_{t-1})}{\alpha_{t-1}\cdot (1- \alpha_{t})}}_{=\sigma_t^2}\Big)}\cdot x_t+ \bigg[ \sqrt{\alpha_{t-1}}-\frac{\sqrt{\alpha_t\cdot (1-\alpha_{t-1})\cdot(\alpha_t-\alpha_t\cdot \alpha_{t-1})}}{\sqrt{ \alpha_{t-1}}\cdot (1-\alpha_t)} \bigg] \cdot x_0 \\ &=\sqrt{\frac{1}{1-\alpha_{t}}\cdot \Big(1-\alpha_{t-1}-\sigma_t^2 \Big)}\cdot x_t+ \bigg[ \sqrt{\alpha_{t-1}}-\frac{\sqrt{\alpha_t\cdot (1-\alpha_{t-1})\cdot(\alpha_t + \alpha_{t-1} -\alpha_{t-1}-\alpha_t\cdot \alpha_{t-1})}}{\sqrt{ \alpha_{t-1}\cdot(1-\alpha_t)}\cdot (\sqrt{1-\alpha_t})} \bigg] \cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{1-\alpha_{t-1}}}{\sqrt{1-\alpha_t}} \cdot \frac{ \sqrt{ \alpha_t \cdot \big(\alpha_t-\alpha_{t-1}+\alpha_{t-1}\cdot(1-\alpha_t)\big)}}{\sqrt{ \alpha_{t-1}\cdot(1-\alpha_t)}} \bigg] \cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{1-\alpha_{t-1}}}{\sqrt{1-\alpha_t}} \cdot \sqrt{ \frac{ \alpha_t \cdot \big(\alpha_t-\alpha_{t-1}+\alpha_{t-1}\cdot(1-\alpha_t)\big)}{\alpha_{t-1}\cdot(1-\alpha_t)}} \bigg] \cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{1-\alpha_{t-1}}}{\sqrt{1-\alpha_t}} \cdot \sqrt{ \alpha_t\cdot \Big(1+\frac{ \alpha_t-\alpha_{t-1}}{\alpha_{t-1}\cdot(1-\alpha_t)}} \Big)\bigg] \cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{1}{\sqrt{1-\alpha_t}} \cdot \sqrt{ (1-\alpha_{t-1}) \cdot \alpha_t\cdot \Big(1-\frac{\alpha_{t-1} - \alpha_t}{\alpha_{t-1}\cdot(1-\alpha_t)}} \Big)\bigg] \cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{1}{\sqrt{1-\alpha_t}} \cdot \sqrt{ \alpha_t\cdot \Big(1-\alpha_{t-1}-\underbrace{ \frac{(\alpha_{t-1} - \alpha_t)\cdot (1-\alpha_{t-1})}{\alpha_{t-1}\cdot(1-\alpha_t)}}_{\sigma_t^2}} \Big)\bigg] \cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{1}{\sqrt{1-\alpha_t}} \cdot \sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )\bigg] \cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0 \\ \end{split} \end{equation} μ
t(xt,x0)=1−αtˉαt⋅(1−αˉt−1)⋅xt+1−αtˉβt⋅αˉt−1⋅x0(DDPM)=αt−1⋅(1−αt)αt⋅(1−αt−1)⋅xt+(1−αt−1αt)⋅1−αtαt−1⋅x0(DDIM)=αt−1⋅(1−αt)2αt⋅(1−αt−1)2⋅xt+αt−1αt−1−αt⋅1−αtαt−1⋅x0=1−αt1−αt−1⋅αt−1−αt−1⋅αtαt−αt⋅αt−1⋅xt+αt−1⋅(1−αt)αt−1−αt⋅x0=1−αt1−αt−1⋅αt−1−αt−1⋅αtαt+αt−1−αt−1−αt⋅αt−1⋅xt+αt−1⋅(1−αt)αt−1−αt⋅αt−1+αt⋅αt−1−αt⋅x0=1−αt1−αt−1⋅(1+αt−1−αt−1⋅αtαt−αt−1)⋅xt+αt−1⋅(1−αt)αt−1⋅(1−αt)−αt⋅(1−αt−1)⋅x0=1−αt1−αt−1⋅(1−αt−1−αt−1⋅αtαt−1−αt)⋅xt+[αt−1−αt−1⋅(1−αt)αt⋅(1−αt−1)]⋅x0=1−αt1⋅(1−αt−1−αt−1−αt−1⋅αt(αt−1−αt)⋅(1−αt−1))⋅xt+[αt−1−αt−1⋅(1−αt)αt2⋅(1−αt−1)2]⋅x0=1−αt1⋅(1−αt−1−=σt2
αt−1⋅(1−αt)(αt−1−αt)⋅(1−αt−1))⋅xt+[αt−1−αt−1⋅(1−αt)αt⋅(1−αt−1)⋅(αt−αt⋅αt−1)]⋅x0=1−αt1⋅(1−αt−1−σt2)⋅xt+[αt−1−αt−1⋅(1−αt)⋅(1−αt)αt⋅(1−αt−1)⋅(αt+αt−1−αt−1−αt⋅αt−1)]⋅x0=1−αt1−αt−1−σt2⋅xt+[αt−1−1−αt1−αt−1⋅αt−1⋅(1−αt)αt⋅(αt−αt−1+αt−1⋅(1−αt))]⋅x0=1−αt1−αt−1−σt2⋅xt+[αt−1−1−αt1−αt−1⋅αt−1⋅(1−αt)αt⋅(αt−αt−1+αt−1⋅(1−αt))]⋅x0=1−αt1−αt−1−σt2⋅xt+[αt−1−1−αt1−αt−1⋅αt⋅(1+αt−1⋅(1−αt)αt−αt−1)]⋅x0=1−αt1−αt−1−σt2⋅xt+[αt−1−1−αt1⋅(1−αt−1)⋅αt⋅(1−αt−1⋅(1−αt)αt−1−αt)]⋅x0=1−αt1−αt−1−σt2⋅xt+[αt−1−1−αt1⋅αt⋅(1−αt−1−σt2
αt−1⋅(1−αt)(αt−1−αt)⋅(1−αt−1))]⋅x0=1−αt1−αt−1−σt2⋅xt+[αt−1−1−αt1⋅αt⋅(1−αt−1−σt2)]⋅x0=1−αt1−αt−1−σt2⋅xt+[αt−1−1−αtαt⋅(1−αt−1−σt2)]⋅x0
因此,前向传播过程中的 q ( x t − 1 ∣ x t , x 0 ) ∼ N ( x t − 1 ; 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 , σ t 2 I ) q(x_{t-1}|x_t,x_0)\sim N(x_{t-1};\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0,\sigma_t^2 I) q(xt−1∣xt,x0)∼N(xt−1;1−αt1−αt−1−σt2⋅xt+[αt−1−1−αtαt⋅(1−αt−1−σt2)]⋅x0,σt2I)