Update.

author François Fleuret <francois@fleuret.org>

Wed, 28 Feb 2024 07:19:50 +0000 (08:19 +0100)

committer François Fleuret <francois@fleuret.org>

Wed, 28 Feb 2024 07:19:50 +0000 (08:19 +0100)
author François Fleuret <francois@fleuret.org>
Wed, 28 Feb 2024 07:19:50 +0000 (08:19 +0100)
committer François Fleuret <francois@fleuret.org>
Wed, 28 Feb 2024 07:19:50 +0000 (08:19 +0100)
diff --git a/elbo.tex b/elbo.tex

index fe91565..4c6cb24 100644 (file)
--- a/elbo.tex
+++ b/elbo.tex
@@ -76,24 +76,25 @@
  \setlength{\abovedisplayshortskip}{2ex}
  \setlength{\belowdisplayshortskip}{2ex}
  
-\vspace*{-4ex}
+\vspace*{-3ex}
  
  \begin{center}
  {\Large The Evidence Lower Bound}
  
-\vspace*{1ex}
+\vspace*{2ex}
  
  Fran\c cois Fleuret
  
+%% \vspace*{2ex}
+
  \today
  
-\vspace*{-1ex}
+%% \vspace*{-1ex}
  
  \end{center}
  
-Given i.i.d training samples $x_1, \dots, x_N$ that follows an unknown
-distribution $\mu_X$, we want to fit a model $p_\theta(x,z)$ to it,
-maximizing
+Given i.i.d training samples $x_1, \dots, x_N$ we want to fit a model
+$p_\theta(x,z)$ to it, maximizing
  %
  \[
  \sum_n \log \, p_\theta(x_n).
@@ -134,6 +135,8 @@ since this maximization pushes that KL term down, it also aligns
  $p_\theta(z \mid x_n)$ and $q(z)$, and we may get a worse
  $p_\theta(x_n)$ to bring $p_\theta(z \mid x_n)$ closer to $q(z)$.
  
+\medskip
+
  However, all this analysis is still valid if $q$ is a parameterized
  function $q_\alpha(z \mid x_n)$ of $x_n$. In that case, if we optimize
  $\theta$ and $\alpha$ to maximize
@@ -145,5 +148,4 @@ $\theta$ and $\alpha$ to maximize
  it maximizes $\log \, p_\theta(x_n)$ and brings $q_\alpha(z \mid
  x_n)$ close to $p_\theta(z \mid x_n)$.
  
-
  \end{document}
author	François Fleuret <francois@fleuret.org>
	Wed, 28 Feb 2024 07:19:50 +0000 (08:19 +0100)
committer	François Fleuret <francois@fleuret.org>
	Wed, 28 Feb 2024 07:19:50 +0000 (08:19 +0100)