1 %% -*- mode: latex; mode: reftex; mode: flyspell; coding: utf-8; tex-command: "pdflatex.sh" -*-
3 %% Any copyright is dedicated to the Public Domain.
4 %% https://creativecommons.org/publicdomain/zero/1.0/
5 %% Written by Francois Fleuret <francois@fleuret.org>
7 \documentclass[11pt,a4paper,oneside]{article}
8 \usepackage[paperheight=15cm,paperwidth=8cm,top=2mm,bottom=15mm,right=2mm,left=2mm]{geometry}
9 %\usepackage[a4paper,top=2.5cm,bottom=2cm,left=2.5cm,right=2.5cm]{geometry}
10 \usepackage[utf8]{inputenc}
11 \usepackage{amsmath,amssymb,dsfont}
12 \usepackage[pdftex]{graphicx}
13 \usepackage[colorlinks=true,linkcolor=blue,urlcolor=blue,citecolor=blue]{hyperref}
15 \usetikzlibrary{arrows,arrows.meta,calc}
16 \usetikzlibrary{patterns,backgrounds}
17 \usetikzlibrary{positioning,fit}
18 \usetikzlibrary{shapes.geometric,shapes.multipart}
19 \usetikzlibrary{patterns.meta,decorations.pathreplacing,calligraphy}
20 \usetikzlibrary{tikzmark}
21 \usetikzlibrary{decorations.pathmorphing}
22 \usepackage[round]{natbib}
23 \usepackage[osf]{libertine}
24 \usepackage{microtype}
26 \usepackage{mleftright}
28 \newcommand{\setmuskip}[2]{#1=#2\relax}
29 \setmuskip{\thinmuskip}{1.5mu} % by default it is equal to 3 mu
30 \setmuskip{\medmuskip}{2mu} % by default it is equal to 4 mu
31 \setmuskip{\thickmuskip}{3.5mu} % by default it is equal to 5 mu
33 \setlength{\parindent}{0cm}
34 \setlength{\parskip}{1ex}
35 %\renewcommand{\baselinestretch}{1.3}
36 %\setlength{\tabcolsep}{0pt}
37 %\renewcommand{\arraystretch}{1.0}
39 \def\argmax{\operatornamewithlimits{argmax}}
40 \def\argmin{\operatornamewithlimits{argmin}}
42 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
44 \def\given{\,\middle\vert\,}
45 \def\proba{\operatorname{P}}
46 \newcommand{\seq}{{S}}
47 \newcommand{\expect}{\mathds{E}}
48 \newcommand{\variance}{\mathds{V}}
49 \newcommand{\empexpect}{\hat{\mathds{E}}}
50 \newcommand{\mutinf}{\mathds{I}}
51 \newcommand{\empmutinf}{\hat{\mathds{I}}}
52 \newcommand{\entropy}{\mathds{H}}
53 \newcommand{\empentropy}{\hat{\mathds{H}}}
54 \newcommand{\ganG}{\mathbf{G}}
55 \newcommand{\ganD}{\mathbf{D}}
56 \newcommand{\ganF}{\mathbf{F}}
58 \newcommand{\dkl}{\mathds{D}_{\mathsf{KL}}}
59 \newcommand{\djs}{\mathds{D}_{\mathsf{JS}}}
61 \newcommand*{\vertbar}{\rule[-1ex]{0.5pt}{2.5ex}}
62 \newcommand*{\horzbar}{\rule[.5ex]{2.5ex}{0.5pt}}
64 \def\positionalencoding{\operatorname{pos-enc}}
65 \def\concat{\operatorname{concat}}
66 \def\crossentropy{\LL_{\operatorname{ce}}}
73 {\Large On Random Variables}
83 \underline{Random variables} (RVs) are central to any model of a
84 random phenomenon, but their mathematical definition is unclear to
85 most. This is an attempt at giving an intuitive understanding of their
86 definition and utility.
88 \section{Modeling randomness}
90 To formalize something ``random'', the natural strategy is to define a
91 distribution, that is, in the finite case, a list of values /
92 probabilities. For instance, the head / tail result of a coin flipping
96 \{(H, 0.5), (T, 0.5)\}.
99 This is perfectly fine, until you have several such objects. To model
100 two coins $A$ and $B$, it seems intuitively okay: they have nothing to
101 do with each other, they are ``independent'', so defining how they
102 behave individually is sufficient.
104 \section{Non-independent variables}
106 The process to generate two random values can be such that they are
107 related. Consider for instance that $A$ is the result of flipping a
108 coin, and $B$ as *the inverse value of $A$*.
110 Both $A$ and $B$ are legitimate RVs, a both have the same distribution
111 (H, 0.5) (T, 0.5). So where is the information that they have a
114 With models of the respective distributions of $A$ and $B$, this is
115 nowhere. This can be fixed in some way by specifying the distribution
116 of the pair $(A, B)$. That would be here
119 \{(H/H, 0.0), (H/T, 0.5), (T/H, 0.5), (T/T, 0.0)\}.
122 The distribution of $A$ and $B$ individually are called the
123 \underline{marginal} distributions, and this is the \underline{joint}
126 Note that the joint is a far richer object than the two marginals, and
127 in general many different joints are consistent with given marginals.
128 Here for instance, the marginals are the same as if $A$ and $B$ where
129 two independent coins, even though they are not.
131 Even though this could somehow work, the notion of a RV here is very
132 unclear: it is not simply a distribution, and every time a new one is
133 defined, it require the specification of the joint with all the
134 variables already defined.
136 \section{Random Variables}
138 The actual definition of a RV is a bit technical. Intuitively, in some
139 way, it consists of defining first ``the source of all randomness'',
140 and then every RV is a deterministic function of it.
142 Formally, it relies first on the definition of a set $\Omega$ such
143 that its subsets can be measured, with all the desirable properties,
144 such as $\mu(\Omega)=1, \mu(\emptyset)=0$ and $A \cap B = \emptyset
145 \Rightarrow \mu(A \cup B) = \mu(A) + \mu(B)$.
147 There is a technical point: for some $\Omega$ it may be impossible to
148 define such a measure on all its subsets due to tricky
149 infinity-related pathologies. So the set $\Sigma$ of
150 \underline{measurable} subsets is explicitly specified and called a
151 $\sigma$-algebra. In any practical situation this technicality does
152 not matter, since $\Sigma$ contains anything needed.
154 The triplet $(\Omega, \Sigma, \mu)$ is a \underline{measured set}.
156 Given such a measured set, an \underline{random variable} $X$ is a
157 mapping from $\Omega$ into another set, and the
158 \underline{probability} that $X$ takes the value $x$ is the measure of
159 the subset of $\Omega$ where $X$ takes the value $x$:
162 P(X=x) = \mu(X^{-1}(x))
165 You can imagine $\Omega$ as the square $[0,1]^2$ in $\mathbb{R}^2$
166 with the usual geometrical area for $\mu$.
168 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
170 For instance if the two coins $A$ and $B$ are flipped independently, we
171 could picture possible random variables with the proper distribution
176 \begin{tikzpicture}[scale=0.8]
177 \draw[pattern=north east lines] (0,0) rectangle ++(0.5,0.5);
178 \draw (0,0) rectangle ++(1,0.5);
179 \node at (2.5,0.2) {$A=\text{head}/\text{tail}$};
181 \draw[fill=red!50] (4.5, 0) rectangle ++(0.5,0.5);
182 \draw (4.5,0) rectangle ++(1,0.5);
183 \node at (7.0,0.2) {$B=\text{head}/\text{tail}$};
189 \begin{tikzpicture}[scale=0.600]
190 \draw[fill=red!50,draw=none] (0, 0) rectangle (2, 4);
191 \draw[draw=none,pattern=north east lines] (0, 0) rectangle (4,2);
192 \draw (0,0) rectangle (4,4);
194 %% \draw[draw=green,thick] (0,0) rectangle ++(2,2);
195 %% \draw[draw=green,thick] (0.1,2.1) rectangle ++(1.8257,1.8257);
196 %% \draw[draw=green,thick] (2.1,0.1) rectangle ++(0.8165,0.8165);
200 \hspace*{\stretch{1}}
202 \begin{tikzpicture}[scale=0.600]
203 \draw[fill=red!50,draw=none] (0, 0) rectangle ++(1, 4);
204 \draw[fill=red!50,draw=none] (1.5, 0) rectangle ++(1, 4);
205 \draw[draw=none,pattern=north east lines] (0, 0.25) rectangle ++(4,0.5);
206 \draw[draw=none,pattern=north east lines] (0, 1.25) rectangle ++(4,0.5);
207 \draw[draw=none,pattern=north east lines] (0, 2.) rectangle ++(4,0.5);
208 \draw[draw=none,pattern=north east lines] (0, 2.5) rectangle ++(4,0.5);
209 \draw (0,0) rectangle (4,4);
212 \hspace*{\stretch{1}}
214 \begin{tikzpicture}[scale=0.600]
215 \draw[fill=red!50,draw=none] (0, 0) rectangle (2, 2);
216 \draw[fill=red!50,draw=none] (0, 4)--(2,4)--(4,2)--(2,2)--cycle;
217 \draw[draw=none,pattern=north east lines] (0.5, 4)--(1.5,4)--(3.5,2)--(2.5,2)--cycle;
218 \draw[draw=none,pattern=north east lines] (3, 3) rectangle (4,4);
219 \draw[draw=none,pattern=north east lines] (0,4)--(1,3)--(0,2)--cycle;
220 \draw[draw=none,pattern=north east lines] (2.25,0) rectangle (3.25,2);
221 \draw[draw=none,pattern=north east lines] (0, 0) rectangle (2,1);
222 \draw (0,0) rectangle (4,4);
225 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
227 And if $A$ is flipped and $B$ is the inverse of $A$, possible RV would
232 \begin{tikzpicture}[scale=0.8]
233 %% \node at (3.2, 1) {Flip A and B = inverse(A)};
235 \draw[pattern=north east lines] (0,0) rectangle ++(0.5,0.5);
236 \draw (0,0) rectangle ++(1,0.5);
237 \node at (2.5,0.2) {$A=\text{head}/\text{tail}$};
239 \draw[fill=red!50] (4.5, 0) rectangle ++(0.5,0.5);
240 \draw (4.5,0) rectangle ++(1,0.5);
241 \node at (7.0,0.2) {$B=\text{head}/\text{tail}$};
246 \begin{tikzpicture}[scale=0.600]
247 \draw[fill=red!50] (0,0) rectangle (4,4);
248 \draw[preaction={fill=white},draw=none,pattern=north east lines] (0, 0) rectangle (2,4);
249 \draw (0,0) rectangle (4,4);
252 \hspace*{\stretch{1}}
254 \begin{tikzpicture}[scale=0.600]
255 \draw[fill=red!50] (0,0) rectangle (4,4);
256 \draw[preaction={fill=white},draw=none,pattern=north east lines] (0, 0) rectangle ++(1,1);
257 \draw[preaction={fill=white},draw=none,pattern=north east lines] (1, 0) rectangle ++(1,1);
258 \draw[preaction={fill=white},draw=none,pattern=north east lines] (3, 0) rectangle ++(1,1);
259 \draw[preaction={fill=white},draw=none,pattern=north east lines] (0, 1) rectangle ++(1,1);
260 \draw[preaction={fill=white},draw=none,pattern=north east lines] (2, 1) rectangle ++(1,1);
261 \draw[preaction={fill=white},draw=none,pattern=north east lines] (0, 2) rectangle ++(1,1);
262 \draw[preaction={fill=white},draw=none,pattern=north east lines] (1, 3) rectangle ++(1,1);
263 \draw[preaction={fill=white},draw=none,pattern=north east lines] (2, 3) rectangle ++(1,1);
264 \draw (0,0) rectangle (4,4);
267 \hspace*{\stretch{1}}
269 \begin{tikzpicture}[scale=0.600]
270 \draw[fill=red!50] (0,0) rectangle (4,4);
271 \draw[preaction={fill=white},draw=none,pattern=north east lines] (0, 0)--(1,1)--(3,1)--(3,4)--(0,1)--cycle;
272 \draw[preaction={fill=white},draw=none,pattern=north east lines] (0, 3) rectangle ++(2,1);
273 \draw[preaction={fill=white},draw=none,pattern=north east lines] (3,0) rectangle ++(1,1);
274 %% \draw (0,0) grid (4,4);
275 \draw (0,0) rectangle (4,4);
278 %% Thanks to this definition, additional random variables can be defined
279 %% with dependency structures. For instance, if $A$ and $B$ are two
280 %% separate coin flipping, and then a third variable $C$ is defined by
281 %% rolling a dice and taking the value of $A$ if it gives $1$ and the
282 %% value of $B$ otherwise.