\section{Conditional Entropy}
-Okay given the visible interest for the topic, an addendum: Conditional entropy is the average of the entropy of the conditional distribution:
+Conditional entropy is the average of the entropy of the conditional distribution:
%
\begin{align*}
&H(X \mid Y)\\
Intuitively it is the [minimum average] number of bits required to describe X given that Y is known.
-So in particular, if X and Y are independent
+So in particular, if X and Y are independent, getting the value of $Y$
+does not help at all, so you still have to send all the bits for $X$,
+hence
%
\[
H(X \mid Y)=H(X)
H(X \mid Y)=0
\]
-And since if you send the bits for Y and then the bits to describe X given that X is known you have sent (X, Y), we have the chain rule:
+And since if you send the bits for Y and then the bits to describe X given that Y, you have sent (X, Y), we have the chain rule:
%
\[
H(X, Y) = H(Y) + H(X \mid Y).