\section{Conditional Entropy}
-Okay given the visible interest for the topic, an addendum: Conditional entropy is the average of the entropy of the conditional distribution:
+Conditional entropy is the average of the entropy of the conditional distribution:
%
\begin{align*}
&H(X \mid Y)\\
Intuitively it is the [minimum average] number of bits required to describe X given that Y is known.
-So in particular, if X and Y are independent
+So in particular, if X and Y are independent, getting the value of $Y$
+does not help at all, so you still have to send all the bits for $X$,
+hence
%
\[
H(X \mid Y)=H(X)