扩散模型数学基础
一. 基础知识1. 马尔可夫假设马尔可夫过程(Markov Process)是一种随机过程,其中系统的未来状态只依赖于当前状态,而与过去的状态无关2. 高斯分布的KL散度KL散度(Kullback-Leibler Divergence)是一种衡量两个概率分布之间差异的指标, 取值范围是[0,+∞][0, +\infty][0,+∞], 越小说明两个概率分布越相似KL散度的定义KL散度的定义为两个概
一. 基础知识
1. 马尔可夫假设
马尔可夫过程(Markov Process)是一种随机过程,其中系统的未来状态只依赖于当前状态,而与过去的状态无关
2. 高斯分布的KL散度
KL散度(Kullback-Leibler Divergence)是一种衡量两个概率分布之间差异的指标, 取值范围是[0,+∞][0, +\infty][0,+∞], 越小说明两个概率分布越相似
KL散度的定义
KL散度的定义为两个概率分布 ( P ) 和 ( Q ) 之间的差异度量,定义如下:
DKL(P∥Q)=∫−∞∞p(x)logp(x)q(x)dx D_{\text{KL}}(P \parallel Q) = \int_{-\infty}^{\infty} p(x) \log \frac{p(x)}{q(x)} dx DKL(P∥Q)=∫−∞∞p(x)logq(x)p(x)dx
我们现在将 ( P ) 和 ( Q ) 设为高斯分布,分别表示为:
- P(x)=N(x;μP,σP2)P(x) = \mathcal{N}(x; \mu_P, \sigma_P^2)P(x)=N(x;μP,σP2)
- Q(x)=N(x;μQ,σQ2)Q(x) = \mathcal{N}(x; \mu_Q, \sigma_Q^2)Q(x)=N(x;μQ,σQ2)
高斯分布的概率密度函数
p(x)=12πσP2exp(−(x−μP)22σP2) p(x) = \frac{1}{\sqrt{2\pi \sigma_P^2}} \exp\left( - \frac{(x - \mu_P)^2}{2 \sigma_P^2} \right) p(x)=2πσP21exp(−2σP2(x−μP)2)
q(x)=12πσQ2exp(−(x−μQ)22σQ2) q(x) = \frac{1}{\sqrt{2\pi \sigma_Q^2}} \exp\left( - \frac{(x - \mu_Q)^2}{2 \sigma_Q^2} \right) q(x)=2πσQ21exp(−2σQ2(x−μQ)2)
推导
- 将高斯分布的密度函数代入KL散度的定义中
DKL(P∥Q)=∫−∞∞p(x)logp(x)q(x)dx D_{\text{KL}}(P \parallel Q) = \int_{-\infty}^{\infty} p(x) \log \frac{p(x)}{q(x)} dx DKL(P∥Q)=∫−∞∞p(x)logq(x)p(x)dx - 先计算 logp(x)q(x)\log \frac{p(x)}{q(x)}logq(x)p(x)
logp(x)q(x)=log(12πσP2exp(−(x−μP)22σP2)12πσQ2exp(−(x−μQ)22σQ2)) \log \frac{p(x)}{q(x)} = \log \left( \frac{\frac{1}{\sqrt{2\pi \sigma_P^2}} \exp\left( - \frac{(x - \mu_P)^2}{2 \sigma_P^2} \right)}{\frac{1}{\sqrt{2\pi \sigma_Q^2}} \exp\left( - \frac{(x - \mu_Q)^2}{2 \sigma_Q^2} \right)} \right) logq(x)p(x)=log 2πσQ21exp(−2σQ2(x−μQ)2)2πσP21exp(−2σP2(x−μP)2) - 化简后得到:
logp(x)q(x)=logσQ2σP2+(x−μQ)22σQ2−(x−μP)22σP2 \log \frac{p(x)}{q(x)} = \log \frac{\sqrt{\sigma_Q^2}}{\sqrt{\sigma_P^2}} + \frac{(x - \mu_Q)^2}{2 \sigma_Q^2} - \frac{(x - \mu_P)^2}{2 \sigma_P^2} logq(x)p(x)=logσP2σQ2+2σQ2(x−μQ)2−2σP2(x−μP)2 - 代入KL散度的积分公式:
DKL(P∥Q)=∫−∞∞12πσP2exp(−(x−μP)22σP2)[logσQ2σP2+(x−μQ)22σQ2−(x−μP)22σP2]dx=∫−∞∞12πσP2exp(−(x−μP)22σP2)logσQ2σP2dx+∫−∞∞12πσP2exp(−(x−μP)22σP2)[(x−μQ)22σQ2−(x−μP)22σP2]dx \begin{aligned} D_{\text{KL}}(P \parallel Q) &= \int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi \sigma_P^2}} \exp\left( - \frac{(x - \mu_P)^2}{2 \sigma_P^2} \right) \left[ \log \frac{\sqrt{\sigma_Q^2}}{\sqrt{\sigma_P^2}} + \frac{(x - \mu_Q)^2}{2 \sigma_Q^2} - \frac{(x - \mu_P)^2}{2 \sigma_P^2} \right] dx \\ &=\int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi \sigma_P^2}} \exp\left( - \frac{(x - \mu_P)^2}{2 \sigma_P^2} \right)\log \frac{\sqrt{\sigma_Q^2}}{\sqrt{\sigma_P^2}}dx + \int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi \sigma_P^2}} \exp\left( - \frac{(x - \mu_P)^2}{2 \sigma_P^2} \right)\left[ \frac{(x - \mu_Q)^2}{2 \sigma_Q^2} - \frac{(x - \mu_P)^2}{2 \sigma_P^2} \right] dx \end{aligned} DKL(P∥Q)=∫−∞∞2πσP21exp(−2σP2(x−μP)2) logσP2σQ2+2σQ2(x−μQ)2−2σP2(x−μP)2 dx=∫−∞∞2πσP21exp(−2σP2(x−μP)2)logσP2σQ2dx+∫−∞∞2πσP21exp(−2σP2(x−μP)2)[2σQ2(x−μQ)2−2σP2(x−μP)2]dx
- 现在进行分布计算
对于前式:
前式=∫−∞∞12πσP2exp(−(x−μP)22σP2)logσQ2σP2dx=logσQ2σP2∫−∞∞12πσP2exp(−(x−μP)22σP2)=logσQ2σP2=logσQσP \begin{aligned} 前式&=\int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi \sigma_P^2}} \exp\left( - \frac{(x - \mu_P)^2}{2 \sigma_P^2} \right)\log \frac{\sqrt{\sigma_Q^2}}{\sqrt{\sigma_P^2}}dx \\ &=\log \frac{\sqrt{\sigma_Q^2}}{\sqrt{\sigma_P^2}}\int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi \sigma_P^2}} \exp\left( - \frac{(x - \mu_P)^2}{2 \sigma_P^2} \right) \\ &= \log \frac{\sqrt{\sigma_Q^2}}{\sqrt{\sigma_P^2}} \\ &=\log \frac{\sigma_Q}{\sigma_P} \end{aligned} 前式=∫−∞∞2πσP21exp(−2σP2(x−μP)2)logσP2σQ2dx=logσP2σQ2∫−∞∞2πσP21exp(−2σP2(x−μP)2)=logσP2σQ2=logσPσQ
对于后式:- 首先将中括号内平方项展开
(x−μQ)2=x2−2xμQ+μQ2(x−μP)2=x2−2xμP+μP2 (x - \mu_Q)^2 = x^2 - 2x\mu_Q + \mu_Q^2 \\ (x - \mu_P)^2 = x^2 - 2x\mu_P + \mu_P^2 (x−μQ)2=x2−2xμQ+μQ2(x−μP)2=x2−2xμP+μP2 - 带入积分中整理得
∫−∞∞12πσP2exp(−(x−μP)22σP2)[(12σQ2−12σP2)x2+(μPσP2−μQσQ2)x+(μQ22σQ2−μP22σP2)]dx \int_{-\infty}^{\infty} \frac{1}{\sqrt{2 \pi \sigma_P^2}} \exp\left( - \frac{(x - \mu_P)^2}{2 \sigma_P^2} \right) \left[ \left( \frac{1}{2 \sigma_Q^2} - \frac{1}{2 \sigma_P^2} \right) x^2 + \left( \frac{\mu_P}{\sigma_P^2} - \frac{\mu_Q}{\sigma_Q^2} \right) x + \left( \frac{\mu_Q^2}{2 \sigma_Q^2} - \frac{\mu_P^2}{2 \sigma_P^2} \right) \right] dx ∫−∞∞2πσP21exp(−2σP2(x−μP)2)[(2σQ21−2σP21)x2+(σP2μP−σQ2μQ)x+(2σQ2μQ2−2σP2μP2)]dx - 积分中x2x^2x2的项, 是高斯分布的二阶矩; 积分中xxx的项, 是高斯分布的一阶矩(期望)
二阶矩:∫−∞∞12πσP2exp(−(x−μP)22σP2)x2dx=σP2+μP2 二阶矩: \int_{-\infty}^{\infty} \frac{1}{\sqrt{2 \pi \sigma_P^2}} \exp\left( - \frac{(x - \mu_P)^2}{2 \sigma_P^2} \right) x^2 dx = \sigma_P^2 + \mu_P^2 \\ 二阶矩:∫−∞∞2πσP21exp(−2σP2(x−μP)2)x2dx=σP2+μP2
一阶矩:∫−∞∞12πσP2exp(−(x−μP)22σP2)xdx=μP 一阶矩: \int_{-\infty}^{\infty} \frac{1}{\sqrt{2 \pi \sigma_P^2}} \exp\left( - \frac{(x - \mu_P)^2}{2 \sigma_P^2} \right) x dx = \mu_P 一阶矩:∫−∞∞2πσP21exp(−2σP2(x−μP)2)xdx=μP
所以得到:
后式=12(1σQ2−1σP2)(σP2+μP2)+12(2μPσP2−2μQσQ2)μP+12(μQ2σQ2−μP2σP2) 后式 = \frac{1}{2}\left( \frac{1}{\sigma_Q^2} - \frac{1}{\sigma_P^2} \right) (\sigma_P^2 + \mu_P^2) + \frac{1}{2}\left( \frac{2\mu_P}{\sigma_P^2} - \frac{2\mu_Q}{\sigma_Q^2} \right) \mu_P + \frac{1}{2}\left( \frac{\mu_Q^2}{\sigma_Q^2} - \frac{\mu_P^2}{\sigma_P^2} \right) 后式=21(σQ21−σP21)(σP2+μP2)+21(σP22μP−σQ22μQ)μP+21(σQ2μQ2−σP2μP2) - 整理后可得
后式=12[σP2+μP2σQ2−1−μP2σP2]+12(2μP2σP2−2μPμQσQ2)+12(μQ2σQ2−μP2σP2)=σP2+μP2−2μPμQ+μQ22σQ2−12=σP2+(μP2−μQ2)22σQ2−12 \begin{aligned} 后式&= \frac{1}{2} \left[ \frac{\sigma_P^2 + \mu_P^2}{\sigma_Q^2} - 1 - \frac{\mu_P^2} {\sigma_P^2} \right] + \frac{1}{2} \left( \frac{2\mu_P^2}{\sigma_P^2} - \frac{2\mu_P \mu_Q}{\sigma_Q^2} \right) + \frac{1}{2} \left( \frac{\mu_Q^2}{\sigma_Q^2} - \frac{\mu_P^2}{\sigma_P^2} \right) \\ &= \frac{\sigma_P^2 + \mu_P^2 - 2\mu_P \mu_Q + \mu_Q^2}{2\sigma_Q^2} - \frac{1}{2} \\ &= \frac{\sigma_P^2 + (\mu_P^2 - \mu_Q^2)^2}{2\sigma_Q^2} - \frac{1}{2} \end{aligned} 后式=21[σQ2σP2+μP2−1−σP2μP2]+21(σP22μP2−σQ22μPμQ)+21(σQ2μQ2−σP2μP2)=2σQ2σP2+μP2−2μPμQ+μQ2−21=2σQ2σP2+(μP2−μQ2)2−21
- 首先将中括号内平方项展开
- 前式加后式:
DKL=logσQσP+σP2+(μP2−μQ2)22σQ2−12 D_{\text{KL}} = \log \frac{\sigma_Q}{\sigma_P} + \frac{\sigma_P^2 + (\mu_P^2 - \mu_Q^2)^2}{2\sigma_Q^2} - \frac{1}{2} DKL=logσPσQ+2σQ2σP2+(μP2−μQ2)2−21
3. 高斯分布的一阶矩与二阶矩
由方差 Var(X)=E[X2]−(E[X])2\mathrm{Var}(X)=E[X^2]-(E[X])^2Var(X)=E[X2]−(E[X])2
- E[X2]E[X^2]E[X2]: 二阶矩
- E[X]E[X]E[X]: 均值
所以二阶矩
E(x2)=Var(X)+(E[X])2=μ2+σ2 E(x^2) = \mathrm{Var}(X) + (E[X])^2 = \mu^2 + \sigma^2 E(x2)=Var(X)+(E[X])2=μ2+σ2
4. 重参数化技巧
对于扩散模型中的xtx_txt, 是先将向上一步的xt−1x_{t-1}xt−1添加高斯噪声, 再从中采样得到.但是我们可以知道采样是个不可导的过程(采样过程中没有四则运算), 所以我们需要使用重参数化技巧使其可导
对于高斯分布P=N(x;μP,σP2)P = \mathcal{N}(x; \mu_P, \sigma_P^2)P=N(x;μP,σP2)
如果想要得到P, 就可以先从标准高斯分布中采样得到z=N(0,1)z = \mathcal{N}(0, 1)z=N(0,1)
再由
- a∗z=N(0,a2)a*z = \mathcal{N}(0, a^2)a∗z=N(0,a2), aaa是常数
- c+z=N(c,1)c+z = \mathcal{N}(c, 1)c+z=N(c,1), ccc是常数
得: P=σP∗z+μPP = \sigma_P * z + \mu_PP=σP∗z+μP
4. 基于马尔科夫假设的条件概率
P(X0,X1,...,Xn)=P(Xn∣Xn−1,...,X1,X0)P(Xn−1,...,X1,X0) P(X_0, X_1, ..., X_n) = P(X_n \mid X_{n-1}, ..., X_1, X_0) P(X_{n-1}, ..., X_1, X_0) P(X0,X1,...,Xn)=P(Xn∣Xn−1,...,X1,X0)P(Xn−1,...,X1,X0)
由马尔科夫假设,有:
P(Xn∣Xn−1,...,X1,X0)=P(Xn∣Xn−1) P(X_n \mid X_{n-1}, ..., X_1, X_0) = P(X_n \mid X_{n-1}) P(Xn∣Xn−1,...,X1,X0)=P(Xn∣Xn−1)
所以:
P(X0,X1,...,Xn)=P(Xn∣Xn−1)P(Xn−1,...,X1,X0) P(X_0, X_1, ..., X_n) = P(X_n \mid X_{n-1}) P(X_{n-1}, ..., X_1, X_0) P(X0,X1,...,Xn)=P(Xn∣Xn−1)P(Xn−1,...,X1,X0)
依次展开:
P(X0,X1,...,Xn)=P(Xn∣Xn−1)P(Xn−1∣Xn−2)...P(X1∣X0)P(X0)=P(X0)∏t=1TP(Xt∣Xt−1) \begin{aligned} P(X_0, X_1, ..., X_n) &= P(X_n \mid X_{n-1}) P(X_{n-1} \mid X_{n-2}) ... P(X_1 \mid X_0) P(X_0) \\ &= P(X_0)\prod_{t=1}^TP(X_t\mid X_{t-1}) \end{aligned} P(X0,X1,...,Xn)=P(Xn∣Xn−1)P(Xn−1∣Xn−2)...P(X1∣X0)P(X0)=P(X0)t=1∏TP(Xt∣Xt−1)
对于条件概率:
P(X1,X2,...,Xn∣X0)=P(Xn∣Xn−1)P(Xn−1∣Xn−2)...P(X1∣X0)P(X0)P(X0)=∏t=0n−1P(Xt+1∣Xt) \begin{aligned} P(X_1, X_2, ..., X_n \mid X_0) &= \frac{P(X_n \mid X_{n-1}) P(X_{n-1} \mid X_{n-2}) ... P(X_1 \mid X_0) \\P(X_0)}{P(X_0)} \\ &=\prod_{t=0}^{n-1} P(X_{t+1} \mid X_t) \end{aligned} P(X1,X2,...,Xn∣X0)=P(X0)P(Xn∣Xn−1)P(Xn−1∣Xn−2)...P(X1∣X0)P(X0)=t=0∏n−1P(Xt+1∣Xt)

魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。
更多推荐
所有评论(0)