Skip to content

vae

https://kexue.fm/search/%E5%8F%98%E5%88%86%E8%87%AA%E7%BC%96%E7%A0%81%E5%99%A8/

变分自编码器(一):原来是这么一回事 https://kexue.fm/archives/5253

基于CNN和VAE的作诗机器人:随机成诗 https://kexue.fm/archives/5332

基础知识

  • KL divergence(KL散度):计算两个分布之间的差异, KL散度值越小两个分布越接近
  • Reparameterization Trick(重参数化技巧): 解决抽样过程导致的不可导问题,使梯度能够从decoder传递到encoder部分
  • ELBO(Evidence Lower BOund): KL公式中存在后验概率无法直接求解,最小化KL等价于最大化ELBO

VAE:https://zhuanlan.zhihu.com/p/34998569

编码器(encoder)的输出为高斯分布参数(均值和方差),两个向量分别为均值向量和方差向量, 根据均值和方差进行采样产生解码器(decoder)的输入向量, 使用KL散度来对分布进行约束

  • AE(autoencoder)
  • VAE(variational autoencoder)
  • CVAE(conditional variational autoencoder)

  • GAN(generative adversarial network)

KL divergence

从P到Q的KL散度:

D_{KL}(P||Q)=-\sum p(x)log(q(x)) + \sum p(x)log(p(x))= H(P,Q) - H(P)

H(P,Q) 为P和Q的交叉熵, H(P) 为P的熵

KL divergence || ELBO || marginal log-likelihood

给定数据集X=\{ x^{(i)} \}^N_{i=1}, 损失函数定义为\sum logp(x^{(i)}), 对于每个数据点i,

$$ logp(x^{(i)}) = $$ - p(z|x^{(i)})给定样本x^{(i)}后z的真实分布 - q(z|x^{(i)})由神经网络给出,可以通过蒙特卡洛估计得到

\begin{aligned} KL \big(q(z|x^{(i)}) \big\Vert p(z|x^{(i)})\big) &= E_{q(z|x^{(i)})} \big[ \log \frac{q(z|x^{(i)})}{p(z \vert x^{(i)})} \big]\\ &= E_{q(z|x^{(i)})} \big[ \log \frac{q(z|x^{(i)})p(x^{(i)})} {p(z,x^{(i)})} \big] \\ &= E_{q(z|x^{(i)})} \big[ \log \frac{q(z|x^{(i)})}{p(z,x^{(i)})} \big] + \log p(x^{(i)}) \\ &= - \mathcal{L} + \log p(x^{(i)}) \end{aligned}
\log p(x^{(i)}) = KL \big(q(z|x^{(i)}) \big\Vert p(z|x^{(i)})\big) + \mathcal{L}

其中\mathcal{L}即为ELBO, 由于KL是非负的, 因此\log p(x^{(i)}) >= \mathcal{L}, ELBO是极大似然\log p(x^{(i)})的下界

\begin{aligned} ELBO &=\mathcal{L} \\ &= -E_{q(z|x^{(i)})} \big[ \log \frac{q(z|x^{(i)})}{p(z,x^{(i)})} \big] \\ &= -E_{q(z|x^{(i)})} \big[ \log \frac{q(z|x^{(i)})}{p(x^{(i)}|z)p(z)} \big] \\ &= -E_{q(z|x^{(i)})} \big[ \log \frac{q(z|x^{(i)})}{p(z)} \big]+ E_{q(z|x^{(i)})} \big[ \log p(x^{(i)} \vert z) \big] \\ &= -KL(q(z|x^{(i)}) \Vert p(z)) + E_{q(z|x^{(i)})} \big[ \log p(x^{(i)} \vert z) \big] \\ &= E_{q(z|x^{(i)})} \big[ \log p(z) \big] - E_{q(z|x^{(i)})} \big[ \log q(z |x^{(i)}) \big] + E_{q(z|x^{(i)})} \big[ \log p(x^{(i)} \vert z) \big] \end{aligned}

ELBO 三项求解 - E_{q(z|x^{(i)})} \big[ \log p(z) \big]求解: p(z) 为隐变量z的先验分布,已知为高斯分布,该项可以求得解析解

E_{q(z|x^{(i)})} \big[ \log p(z) \big] = \int

logp(x) marginal log-likelihood - marginal 指边缘分布, 边缘分布p(x)是相对于p(x, z)联合分布的概念 - log-likelihood 对数似然

logp(x^{(i)})=\log p(x^{(i)}) +

KL 散度的计算

离散随机变量X服从P(X),

P(X)= \begin{cases} 0.2, & \text{X = 1} \\ 0.4, & \text{X = 2} \\ 0.4, & \text{X = 3} \end{cases}

离散随机变量Y服从Q(Y),

Q(X)= \begin{cases} 0.2, & \text{X = 1} \\ 0.4, & \text{X = 2} \\ 0.4, & \text{X = 3} \end{cases}

则从P到Q的KL散度为:

D(P||Q)=0.2 \times log(\frac{0.2}{0.4}) + 0.4 \times log(\frac{0.4}{0.2}) + 0.4 \times log(\frac{0.4}{0.4}) = 0.138

对于离散随机变量,计算KL散度需要知道随机变量的取值和概率

X=g(Z) 从隐变量Z生成随机变量X, 隐变量Z为n维向量,每一维均服从标准正态分布, 随机变量X为我们需要生成的数据,例如图像, 如何得到变换函数g使得生成的图像X和真实图像很像呢? 描述生成的图像是否与真实图像很像,其实是说生成的图像和真实图像的分布是否一致,可以通过KL散度来描述, 但是我们无法直接计算KL散度,因为图像真实分布的概率我们无法拿到。

图像用随机变量X来表达, 记其概率分布为P(X), P(X)的表达式现在未知,否则我们直接可以由表达式生成图像, 我们借用P(X)=P(X|Z)P(Z) 来获得P(X), 随机变量Z的概率分布P(Z)为标准正态分布, P(X|Z) 为Z变换到X的概率分布, 对应vae中的生成器

拟合的分布q(z|x_i)和真实的分布p(z|x_i)的差距如何衡量?

\begin{aligned}&KL\Big(N(\mu,\sigma^2)\Big\Vert N(0,1)\Big)\\ =&\int \frac{1}{\sqrt{2\pi\sigma^2}}e^{-(x-\mu)^2/2\sigma^2} \left(\log \frac{e^{-(x-\mu)^2/2\sigma^2}/\sqrt{2\pi\sigma^2}}{e^{-x^2/2}/\sqrt{2\pi}}\right)dx\\ =&\int \frac{1}{\sqrt{2\pi\sigma^2}}e^{-(x-\mu)^2/2\sigma^2} \log \left\{\frac{1}{\sqrt{\sigma^2}}\exp\left\{\frac{1}{2}\big[x^2-(x-\mu)^2/\sigma^2\big]\right\} \right\}dx\\ =&\frac{1}{2}\int \frac{1}{\sqrt{2\pi\sigma^2}}e^{-(x-\mu)^2/2\sigma^2} \Big[-\log \sigma^2+x^2-(x-\mu)^2/\sigma^2 \Big] dx\end{aligned}

化简后得到:

KL\Big(N(\mu,\sigma^2)\Big\Vert N(0,1)\Big)=\frac{1}{2}\Big(-\log \sigma^2+\mu^2+\sigma^2-1\Big)

\sigma^2 -\log \sigma^2 - 1 可以视为让\sigma^2接近1的损失函数, 像一个对号形状的函数

x = np.linspace(0.000001,20,100)
y = x - np.log(x) - 1
plt.plot(x, y)

\mu^2 可以视为让\mu接近0的损失函数

p(x)为正态分布

p(x)=\frac{1}{\sqrt{{2\pi}}\times\sigma}e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2}

那么logp(x)为:

logp(x)=-\frac{1}{2} \big[ (\frac{x-\mu}{\sigma})^2+ \log \sigma^2+\log 2\pi\big]

代码实验

cvae

  def reparameterize(self, mean, logvar):
    eps = tf.random_normal(shape=mean.shape)
    return eps * tf.exp(logvar * .5) + mean

eps 为标准正态分布采样, 然后根据均值和方差变换成新的正态分布 均值为μ,方差为σ的正态分布Z变换为标准正态分布X,变换公式如下: X=\frac{Z - μ}{σ} ; 标准正态分布X变换为均值为μ,方差为σ的正态分布Z,变换公式如下: Z=X*σ + μ

def log_normal_pdf(sample, mean, logvar, raxis=1):
  log2pi = tf.log(2. * np.pi)
  return tf.reduce_sum(
      -.5 * ((sample - mean) ** 2. * tf.exp(-logvar) + logvar + log2pi),
      axis=raxis)

def compute_loss(model, x):
  mean, logvar = model.encode(x)
  z = model.reparameterize(mean, logvar)
  x_logit = model.decode(z)

  cross_ent = tf.nn.sigmoid_cross_entropy_with_logits(logits=x_logit, labels=x)
  logpx_z = -tf.reduce_sum(cross_ent, axis=[1, 2, 3])
  logpz = log_normal_pdf(z, 0., 0.)
  logqz_x = log_normal_pdf(z, mean, logvar)
  return -tf.reduce_mean(logpx_z + logpz - logqz_x)
  1. model.encode方法将输入图像x(batch_size, image_height, image_width, 1)变换成两个向量mean(batch_size,latent_dim)和logvar(batch_size, latent_dim), 生成logvar而不是var是为了数值稳定性,每幅图像都对应一个私有的mean和logvar向量
  2. model.reparameterize根据mean和logvar分布参数,来采样生成向量z(batch_size, latent_dim), 采样是为了进行蒙特卡洛估计
  3. model.decode方法将向量z变换为重建图像x_logit(还未经过sigmoid)
  4. log_normal_pdf normal指正态分布,pdf指概率密度函数,log_normal_pdf为概率密度函数取对数,结果仍然是个函数
  5. z为均值为mean,方差的对数为logvar的采样样本
  6. logpx_z为 logP(X|Z)的采样, 因为现在X为二值图像,服从伯努利分布,所以计算交叉熵损失的cross_ent负数,如果图像为实值,则计算平方损失mse
  7. logpz为 logP(Z) 的采样, p(Z)为标准正态分布,因此mean=0,logvar=0
  8. logqz_x为 logQ(Z|X)的采样, Q(Z|X)为正态分布
  9. logpz - logqz_x 的期望可以不通过采样来进行计算,而是直接使用解析解进行计算

交叉熵损失为:\sum_{k=1}^D \Big[- x_{(k)} \ln \rho_{(k)}(z) - (1-x_{(k)}) \ln \Big(1 -\rho_{(k)}(z)\Big)\Big]

对数似然为:\ln p(x|z) = \sum_{k=1}^D \Big[ x_{(k)} \ln \rho_{(k)}(z) + (1-x_{(k)}) \ln \Big(1 -\rho_{(k)}(z)\Big)\Big]即交叉熵的负数

直接使用解析解求解logpz和logpz_x的期望, 和上边的区别是公式不同,没有使用采样点z

def compute_loss(model, x):
  mean, logvar = model.encode(x)
  z = model.reparameterize(mean, logvar)
  x_logit = model.decode(z)

  cross_ent = tf.nn.sigmoid_cross_entropy_with_logits(logits=x_logit, labels=x)
  logpx_z = -tf.reduce_sum(cross_ent, axis=[1, 2, 3])
  logpz = tf.reduce_sum(-0.5*(tf.log(2. * np.pi) - 1 - logvar), axis=1)
  logqz_x = tf.reduce_sum(-0.5*(tf.log(2. * np.pi) - tf.square(mean) - tf.exp(logvar)), axis=1)
  return -tf.reduce_mean(logpx_z + logpz - logqz_x)
KL\Big(p(x)\Big\Vert q(x)\Big) = \int p(x)\ln \frac{p(x)}{q(x)} dx=\mathbb{E}_{x\sim p(x)}\left[\ln \frac{p(x)}{q(x)}\right]\tag{5}

联合分布的KL散度:

p(x,z)=\tilde{p}(x)p(z|x)

\begin{aligned} KL\Big(p(x,z)\Big\Vert q(x,z)\Big) =& \iint p(x,z)\ln \frac{p(x,z)}{q(x,z)} dxdz \\ =& \int \tilde{p}(x) \left[\int p(z|x)\ln \frac{\tilde{p}(x)p(z|x)}{q(x,z)} dz\right]dx\\ =& \mathbb{E}_{x\sim \tilde{p}(x)} \left[\int p(z|x)\ln \frac{\tilde{p}(x)p(z|x)}{q(x,z)} dz\right] \\ =& \mathbb{E}_{x\sim \tilde{p}(x)} \left[\ln \tilde{p}(x)\int p(z|x)dz + \int p(z|x) \ln p(z|x)dz - \int p(z|x) \ln q(x,z)dz \right]\\ \end{aligned}