randvar.tex

   1 %% -*- mode: latex; mode: reftex; mode: flyspell; coding: utf-8; tex-command: "pdflatex.sh" -*-
   2
   3 %% Any copyright is dedicated to the Public Domain.
   4 %% https://creativecommons.org/publicdomain/zero/1.0/
   5 %% Written by Francois Fleuret <francois@fleuret.org>
   6
   7 \documentclass[11pt,a4paper,oneside]{article}
   8 \usepackage[paperheight=15cm,paperwidth=8cm,top=2mm,bottom=15mm,right=2mm,left=2mm]{geometry}
   9 %\usepackage[a4paper,top=2.5cm,bottom=2cm,left=2.5cm,right=2.5cm]{geometry}
  10 \usepackage[utf8]{inputenc}
  11 \usepackage{amsmath,amssymb,dsfont}
  12 \usepackage[pdftex]{graphicx}
  13 \usepackage[colorlinks=true,linkcolor=blue,urlcolor=blue,citecolor=blue]{hyperref}
  14 \usepackage{tikz}
  15 \usetikzlibrary{arrows,arrows.meta,calc}
  16 \usetikzlibrary{patterns,backgrounds}
  17 \usetikzlibrary{positioning,fit}
  18 \usetikzlibrary{shapes.geometric,shapes.multipart}
  19 \usetikzlibrary{patterns.meta,decorations.pathreplacing,calligraphy}
  20 \usetikzlibrary{tikzmark}
  21 \usetikzlibrary{decorations.pathmorphing}
  22 \usepackage[round]{natbib}
  23 \usepackage[osf]{libertine}
  24 \usepackage{microtype}
  25
  26 \usepackage{mleftright}
  27
  28 \newcommand{\setmuskip}[2]{#1=#2\relax}
  29 \setmuskip{\thinmuskip}{1.5mu} % by default it is equal to 3 mu
  30 \setmuskip{\medmuskip}{2mu} % by default it is equal to 4 mu
  31 \setmuskip{\thickmuskip}{3.5mu} % by default it is equal to 5 mu
  32
  33 \setlength{\parindent}{0cm}
  34 \setlength{\parskip}{1ex}
  35 %\renewcommand{\baselinestretch}{1.3}
  36 %\setlength{\tabcolsep}{0pt}
  37 %\renewcommand{\arraystretch}{1.0}
  38
  39 \def\argmax{\operatornamewithlimits{argmax}}
  40 \def\argmin{\operatornamewithlimits{argmin}}
  41
  42 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  43
  44 \def\given{\,\middle\vert\,}
  45 \def\proba{\operatorname{P}}
  46 \newcommand{\seq}{{S}}
  47 \newcommand{\expect}{\mathds{E}}
  48 \newcommand{\variance}{\mathds{V}}
  49 \newcommand{\empexpect}{\hat{\mathds{E}}}
  50 \newcommand{\mutinf}{\mathds{I}}
  51 \newcommand{\empmutinf}{\hat{\mathds{I}}}
  52 \newcommand{\entropy}{\mathds{H}}
  53 \newcommand{\empentropy}{\hat{\mathds{H}}}
  54 \newcommand{\ganG}{\mathbf{G}}
  55 \newcommand{\ganD}{\mathbf{D}}
  56 \newcommand{\ganF}{\mathbf{F}}
  57
  58 \newcommand{\dkl}{\mathds{D}_{\mathsf{KL}}}
  59 \newcommand{\djs}{\mathds{D}_{\mathsf{JS}}}
  60
  61 \newcommand*{\vertbar}{\rule[-1ex]{0.5pt}{2.5ex}}
  62 \newcommand*{\horzbar}{\rule[.5ex]{2.5ex}{0.5pt}}
  63
  64 \def\positionalencoding{\operatorname{pos-enc}}
  65 \def\concat{\operatorname{concat}}
  66 \def\crossentropy{\LL_{\operatorname{ce}}}
  67
  68 \begin{document}
  69
  70 \vspace*{0ex}
  71
  72 \begin{center}
  73 {\Large On Random Variables}
  74
  75 Fran\c cois Fleuret
  76
  77 \today
  78
  79 \vspace*{1ex}
  80
  81 \end{center}
  82
  83 \underline{Random variables} (RVs) are central to any model of a
  84 random phenomenon, but their mathematical definition is unclear to
  85 most. This is an attempt at giving an intuitive understanding of their
  86 definition and utility.
  87
  88 \section{Modeling randomness}
  89
  90 To formalize something ``random'', the natural strategy is to define a
  91 distribution, that is, in the finite case, a list of values /
  92 probabilities. For instance, the head / tail result of a coin flipping
  93 would be
  94 %
  95 \[
  96 \{(H, 0.5), (T, 0.5)\}.
  97 \]
  98
  99 This is perfectly fine, until you have several such objects. To model
 100 two coins $A$ and $B$, it seems intuitively okay: they have nothing to
 101 do with each other, they are ``independent'', so defining how they
 102 behave individually is sufficient.
 103
 104 \section{Non-independent variables}
 105
 106 The process to generate two random values can be such that they are
 107 related. Consider for instance that $A$ is the result of flipping a
 108 coin, and $B$ as *the inverse value of $A$*.
 109
 110 Both $A$ and $B$ are legitimate RVs, a both have the same distribution
 111 (H, 0.5) (T, 0.5). So where is the information that they have a
 112 relation?
 113
 114 With models of the respective distributions of $A$ and $B$, this is
 115 nowhere. This can be fixed in some way by specifying the distribution
 116 of the pair $(A, B)$. That would be here
 117 %
 118 \[
 119 \{(H/H, 0.0), (H/T, 0.5), (T/H, 0.5), (T/T, 0.0)\}.
 120 \]
 121
 122 The distribution of $A$ and $B$ individually are called the
 123 \underline{marginal} distributions, and this is the \underline{joint}
 124 distribution.
 125
 126 Note that the joint is a far richer object than the two marginals, and
 127 in general many different joints are consistent with given marginals.
 128 Here for instance, the marginals are the same as if $A$ and $B$ where
 129 two independent coins, even though they are not.
 130
 131 Even though this could somehow work, the notion of a RV here is very
 132 unclear: it is not simply a distribution, and every time a new one is
 133 defined, it require the specification of the joint with all the
 134 variables already defined.
 135
 136 \section{Random Variables}
 137
 138 The actual definition of a RV is a bit technical. Intuitively, in some
 139 way, it consists of defining first ``the source of all randomness'',
 140 and then every RV is a deterministic function of it.
 141
 142 Formally, it relies first on the definition of a set $\Omega$ such
 143 that its subsets can be measured, with all the desirable properties,
 144 such as $\mu(\Omega)=1, \mu(\emptyset)=0$ and $A \cap B = \emptyset
 145 \Rightarrow \mu(A \cup B) = \mu(A) + \mu(B)$.
 146
 147 There is a technical point: for some $\Omega$ it may be impossible to
 148 define such a measure on all its subsets due to tricky
 149 infinity-related pathologies. So the set $\Sigma$ of
 150 \underline{measurable} subsets is explicitly specified and called a
 151 $\sigma$-algebra. In any practical situation this technicality does
 152 not matter, since $\Sigma$ contains anything needed.
 153
 154 The triplet $(\Omega, \Sigma, \mu)$ is a \underline{measured set}.
 155
 156 Given such a measured set, an \underline{random variable} $X$ is a
 157 mapping from $\Omega$ into another set, and the
 158 \underline{probability} that $X$ takes the value $x$ is the measure of
 159 the subset of $\Omega$ where $X$ takes the value $x$:
 160 %
 161 \[
 162 P(X=x) = \mu(X^{-1}(x))
 163 \]
 164
 165 You can imagine $\Omega$ as the square $[0,1]^2$ in $\mathbb{R}^2$
 166 with the usual geometrical area for $\mu$.
 167
 168 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 169
 170 For instance if the two coins $A$ and $B$ are flipped independently, we
 171 could picture possible random variables with the proper distribution
 172 as follows:
 173
 174 \nopagebreak
 175
 176 \begin{tikzpicture}[scale=0.8]
 177 \draw[pattern=north east lines] (0,0) rectangle ++(0.5,0.5);
 178 \draw (0,0) rectangle ++(1,0.5);
 179 \node at (2.5,0.2) {$A=\text{head}/\text{tail}$};
 180
 181 \draw[fill=red!50] (4.5, 0) rectangle ++(0.5,0.5);
 182 \draw (4.5,0) rectangle ++(1,0.5);
 183 \node at (7.0,0.2) {$B=\text{head}/\text{tail}$};
 184 \end{tikzpicture}
 185 %
 186
 187 \nopagebreak
 188
 189 \begin{tikzpicture}[scale=0.600]
 190 \draw[fill=red!50,draw=none] (0, 0) rectangle (2, 4);
 191 \draw[draw=none,pattern=north east lines] (0, 0) rectangle (4,2);
 192 \draw (0,0) rectangle (4,4);
 193
 194 %% \draw[draw=green,thick] (0,0) rectangle ++(2,2);
 195 %% \draw[draw=green,thick] (0.1,2.1) rectangle ++(1.8257,1.8257);
 196 %% \draw[draw=green,thick] (2.1,0.1) rectangle ++(0.8165,0.8165);
 197
 198 \end{tikzpicture}
 199 %
 200 \hspace*{\stretch{1}}
 201 %
 202 \begin{tikzpicture}[scale=0.600]
 203 \draw[fill=red!50,draw=none] (0, 0) rectangle ++(1, 4);
 204 \draw[fill=red!50,draw=none] (1.5, 0) rectangle ++(1, 4);
 205 \draw[draw=none,pattern=north east lines] (0, 0.25) rectangle ++(4,0.5);
 206 \draw[draw=none,pattern=north east lines] (0, 1.25) rectangle ++(4,0.5);
 207 \draw[draw=none,pattern=north east lines] (0, 2.) rectangle ++(4,0.5);
 208 \draw[draw=none,pattern=north east lines] (0, 2.5) rectangle ++(4,0.5);
 209 \draw (0,0) rectangle (4,4);
 210 \end{tikzpicture}
 211 %
 212 \hspace*{\stretch{1}}
 213 %
 214 \begin{tikzpicture}[scale=0.600]
 215 \draw[fill=red!50,draw=none] (0, 0) rectangle (2, 2);
 216 \draw[fill=red!50,draw=none] (0, 4)--(2,4)--(4,2)--(2,2)--cycle;
 217 \draw[draw=none,pattern=north east lines] (0.5, 4)--(1.5,4)--(3.5,2)--(2.5,2)--cycle;
 218 \draw[draw=none,pattern=north east lines] (3, 3) rectangle (4,4);
 219 \draw[draw=none,pattern=north east lines] (0,4)--(1,3)--(0,2)--cycle;
 220 \draw[draw=none,pattern=north east lines] (2.25,0) rectangle (3.25,2);
 221 \draw[draw=none,pattern=north east lines] (0, 0) rectangle (2,1);
 222 \draw (0,0) rectangle (4,4);
 223 \end{tikzpicture}
 224
 225 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 226
 227 And if $A$ is flipped and $B$ is the inverse of $A$, possible RV would
 228 be
 229
 230 \nopagebreak
 231
 232 \begin{tikzpicture}[scale=0.8]
 233 %% \node at (3.2, 1) {Flip A and B = inverse(A)};
 234
 235 \draw[pattern=north east lines] (0,0) rectangle ++(0.5,0.5);
 236 \draw (0,0) rectangle ++(1,0.5);
 237 \node at (2.5,0.2) {$A=\text{head}/\text{tail}$};
 238
 239 \draw[fill=red!50] (4.5, 0) rectangle ++(0.5,0.5);
 240 \draw (4.5,0) rectangle ++(1,0.5);
 241 \node at (7.0,0.2) {$B=\text{head}/\text{tail}$};
 242 \end{tikzpicture}
 243
 244 \nopagebreak
 245
 246 \begin{tikzpicture}[scale=0.600]
 247 \draw[fill=red!50] (0,0) rectangle (4,4);
 248 \draw[preaction={fill=white},draw=none,pattern=north east lines] (0, 0) rectangle (2,4);
 249 \draw (0,0) rectangle (4,4);
 250 \end{tikzpicture}
 251 %
 252 \hspace*{\stretch{1}}
 253 %
 254 \begin{tikzpicture}[scale=0.600]
 255 \draw[fill=red!50] (0,0) rectangle (4,4);
 256 \draw[preaction={fill=white},draw=none,pattern=north east lines] (0, 0) rectangle ++(1,1);
 257 \draw[preaction={fill=white},draw=none,pattern=north east lines] (1, 0) rectangle ++(1,1);
 258 \draw[preaction={fill=white},draw=none,pattern=north east lines] (3, 0) rectangle ++(1,1);
 259 \draw[preaction={fill=white},draw=none,pattern=north east lines] (0, 1) rectangle ++(1,1);
 260 \draw[preaction={fill=white},draw=none,pattern=north east lines] (2, 1) rectangle ++(1,1);
 261 \draw[preaction={fill=white},draw=none,pattern=north east lines] (0, 2) rectangle ++(1,1);
 262 \draw[preaction={fill=white},draw=none,pattern=north east lines] (1, 3) rectangle ++(1,1);
 263 \draw[preaction={fill=white},draw=none,pattern=north east lines] (2, 3) rectangle ++(1,1);
 264 \draw (0,0) rectangle (4,4);
 265 \end{tikzpicture}
 266 %
 267 \hspace*{\stretch{1}}
 268 %
 269 \begin{tikzpicture}[scale=0.600]
 270 \draw[fill=red!50] (0,0) rectangle (4,4);
 271 \draw[preaction={fill=white},draw=none,pattern=north east lines] (0, 0)--(1,1)--(3,1)--(3,4)--(0,1)--cycle;
 272 \draw[preaction={fill=white},draw=none,pattern=north east lines] (0, 3) rectangle ++(2,1);
 273 \draw[preaction={fill=white},draw=none,pattern=north east lines] (3,0) rectangle ++(1,1);
 274 %% \draw (0,0) grid (4,4);
 275 \draw (0,0) rectangle (4,4);
 276 \end{tikzpicture}
 277
 278 %% Thanks to this definition, additional random variables can be defined
 279 %% with dependency structures. For instance, if $A$ and $B$ are two
 280 %% separate coin flipping, and then a third variable $C$ is defined by
 281 %% rolling a dice and taking the value of $A$ if it gives $1$ and the
 282 %% value of $B$ otherwise.
 283
 284 \end{document}