\begin{document}
-\vspace*{0ex}
+\setlength{\abovedisplayskip}{2ex}
+\setlength{\belowdisplayskip}{2ex}
+\setlength{\abovedisplayshortskip}{2ex}
+\setlength{\belowdisplayshortskip}{2ex}
+
+\vspace*{-4ex}
\begin{center}
{\Large The Evidence Lower Bound}
+\vspace*{1ex}
+
Fran\c cois Fleuret
\today
-\vspace*{1ex}
+\vspace*{-1ex}
\end{center}
& = \expect_{Z \sim q(z)} \left[\frac{p_\theta(x_n,Z)}{q(Z)}\right].
\end{align*}
%
-So if we wanted to maximize $p_\theta(x_n)$ alone, we could sample a
+So if we sample a
$Z$ with $q$ and maximize
%
\begin{equation*}
-\frac{p_\theta(x_n,Z)}{q(Z)}.\label{eq:estimator}
+\frac{p_\theta(x_n,Z)}{q(Z)},
\end{equation*}
+%
+we do maximize $p_\theta(x_n)$ on average.
But we want to maximize $\sum_n \log \, p_\theta(x_n)$. If we use the
$\log$ of the previous expression, we can decompose its average value