report/culture.tex

   1 %% -*- mode: latex; mode: reftex; mode: flyspell; coding: utf-8; tex-command: "pdflatex.sh" -*-
   2
   3 %% Any copyright is dedicated to the Public Domain.
   4 %% https://creativecommons.org/publicdomain/zero/1.0/
   5 %% Written by Francois Fleuret <francois@fleuret.org>
   6
   7 \documentclass[11pt,a4paper,oneside]{article}
   8 \usepackage[paperheight=15cm,paperwidth=8cm,top=2mm,bottom=15mm,right=2mm,left=2mm]{geometry}
   9 %\usepackage[a4paper,top=2.5cm,bottom=2cm,left=2.5cm,right=2.5cm]{geometry}
  10 \usepackage[utf8]{inputenc}
  11 \usepackage{amsmath,amssymb,dsfont}
  12 \usepackage[pdftex]{graphicx}
  13 \usepackage[colorlinks=true,linkcolor=blue,urlcolor=blue,citecolor=blue]{hyperref}
  14 \urlstyle{same}
  15 \usepackage{tikz}
  16 \usetikzlibrary{arrows,arrows.meta,calc}
  17 \usetikzlibrary{patterns,backgrounds}
  18 \usetikzlibrary{positioning,fit}
  19 \usetikzlibrary{shapes.geometric,shapes.multipart}
  20 \usetikzlibrary{patterns.meta,decorations.pathreplacing,calligraphy}
  21 \usetikzlibrary{tikzmark}
  22 \usetikzlibrary{decorations.pathmorphing}
  23 \usepackage[round]{natbib}
  24 \usepackage[osf]{libertine}
  25 \usepackage{microtype}
  26
  27 \usepackage{mleftright}
  28
  29 \usepackage{enumitem}
  30 \setlist[itemize]{leftmargin=0pt,itemindent=1em,itemsep=2ex}
  31 \setlist{nosep} % or \setlist{noitemsep} to leave space around whole list
  32
  33 \newcommand{\setmuskip}[2]{#1=#2\relax}
  34 \setmuskip{\thinmuskip}{1.5mu} % by default it is equal to 3 mu
  35 \setmuskip{\medmuskip}{2mu} % by default it is equal to 4 mu
  36 \setmuskip{\thickmuskip}{3.5mu} % by default it is equal to 5 mu
  37
  38 \setlength{\parindent}{0cm}
  39 \setlength{\parskip}{1ex}
  40 %\renewcommand{\baselinestretch}{1.3}
  41 %\setlength{\tabcolsep}{0pt}
  42 %\renewcommand{\arraystretch}{1.0}
  43
  44 \def\argmax{\operatornamewithlimits{argmax}}
  45 \def\argmin{\operatornamewithlimits{argmin}}
  46
  47 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  48
  49 \def\given{\,\middle\vert\,}
  50 \def\proba{\operatorname{P}}
  51 \newcommand{\seq}{{S}}
  52 \newcommand{\expect}{\mathds{E}}
  53 \newcommand{\variance}{\mathds{V}}
  54 \newcommand{\empexpect}{\hat{\mathds{E}}}
  55 \newcommand{\mutinf}{\mathds{I}}
  56 \newcommand{\empmutinf}{\hat{\mathds{I}}}
  57 \newcommand{\entropy}{\mathds{H}}
  58 \newcommand{\empentropy}{\hat{\mathds{H}}}
  59 \newcommand{\ganG}{\mathbf{G}}
  60 \newcommand{\ganD}{\mathbf{D}}
  61 \newcommand{\ganF}{\mathbf{F}}
  62
  63 \newcommand{\dkl}{\mathds{D}_{\mathsf{KL}}}
  64 \newcommand{\djs}{\mathds{D}_{\mathsf{JS}}}
  65
  66 \newcommand*{\vertbar}{\rule[-1ex]{0.5pt}{2.5ex}}
  67 \newcommand*{\horzbar}{\rule[.5ex]{2.5ex}{0.5pt}}
  68
  69 \def\positionalencoding{\operatorname{pos-enc}}
  70 \def\concat{\operatorname{concat}}
  71 \def\crossentropy{\LL_{\operatorname{ce}}}
  72
  73 \begin{document}
  74
  75 \vspace*{-3ex}
  76
  77 \begin{center}
  78 {\Large Self-Generated Culture}
  79
  80 Fran\c cois Fleuret
  81
  82 \today
  83
  84 \vspace*{2ex}
  85
  86 \centerline{\color{red}(work in progress, to be updated)}\\[3ex]
  87
  88 \centerline{\url{https://fleuret.org/public/culture/culture.pdf}}
  89
  90 \end{center}
  91
  92 \section{Introduction}
  93
  94 The hypothesis behind this experiment is that high-level abstract
  95 thinking is fueled by social competition. A group of communicating
  96 agents that try to demonstrate their cognitive superiority would end
  97 up developing a rich and consistent culture.
  98
  99 The experiment is designed with a group of GPTs that alternatively
 100 learn to solve quizzes and generate new ones.
 101
 102 A ``quiz'' is a triplet of the form $(A, d, B)$ where $A$ and $B$ are
 103 two sequences and $d$ is a token indicating if the direction is
 104 forward or backward. Given $(A, d)$, the challenge is to generate $B$.
 105
 106 The experiments starts with a set of quizzes, that is going to be
 107 progressively enriched.
 108
 109 \section{Bird World}
 110
 111 The initial set of quizzes consist of predicting the dynamics of a
 112 very simple world: A $6 \times 8$ grid with three colored ``birds'' moving in
 113 a straight line, possibly bouncing on the grid's borders. There are
 114 ten different colors.
 115 %
 116 \begin{center}
 117 \includegraphics[scale=0.35]{pics/examples_train.png}
 118 \end{center}
 119 %
 120
 121 \vspace*{-2ex}
 122
 123 In each on these quizzes, $A$ is the left image serialized in
 124 raster-scan order as a sequence of $6 \times 8 = 48$ tokens, $d$ is
 125 either the token ``forward'' or the token ``backward'', and $B$ is the
 126 right image, also serialized. The direction of prediction is chosen at
 127 random.
 128
 129 \section{Generating Quizzes}
 130
 131 Given a set of $N$ GPTs, we can generate new quizzes as follows:
 132 Select one of the models, and use it to generate the $97$ tokens of a
 133 triplet $(A, d, B)$.
 134
 135 Then with each one of the $N-1$ other models, predict $B$ from $(A,
 136 d)$, and $A$ from $(B, d')$ where $d'$ is the direction token opposite
 137 of $d$.
 138
 139 A quiz is validated if \textbf{all the other GPTs but one predict it
 140   deterministically correctly in both directions.}
 141
 142 This criterion assures that the new quizzes are both solvable and
 143 sophisticated, and incrementally complexify the culture. Imposing both
 144 direction prevents the generation of quizzes which are not trivial
 145 only because the prompt has been randomly degraded.
 146
 147 \section{Overall Process}
 148
 149 The overall process consists of training the GPTs from scratch by
 150 iterating the following steps:
 151 %
 152 \begin{itemize}
 153
 154 \item select the GPT with the lowest recorded test accuracy, train it through one epoch,
 155
 156 \item if its test accuracy gets above $97.5\%$, generate $1'000$ new
 157   quizzes, add them to the training set, re-compute the accuracy of
 158   all the models
 159
 160 \end{itemize}
 161
 162 \section{Results}
 163
 164 This procedure results in the discovery of patterns which are not
 165 present in the original quizzes:
 166
 167 \textbf{More birds}
 168
 169 \begin{center}
 170 \includegraphics[scale=0.35]{pics/4_birds_1.png}
 171 \includegraphics[scale=0.35]{pics/5_birds_1.png}
 172
 173 \includegraphics[scale=0.35]{pics/6_birds_1.png}
 174 \end{center}
 175
 176 \textbf{New bird shapes}
 177
 178 \begin{center}
 179
 180 \includegraphics[scale=0.35]{pics/other_shapes_2.png}
 181 \includegraphics[scale=0.35]{pics/other_shapes_3.png}
 182 \end{center}
 183
 184 \textbf{Occlusions}
 185
 186 \begin{center}
 187 \includegraphics[scale=0.35]{pics/other_shapes_1.png}
 188 \includegraphics[scale=0.35]{pics/occlusions_1.png}
 189 \end{center}
 190
 191 \section*{Appendix}
 192
 193 The code is available at\\[-2ex]
 194
 195 \centerline{\url{https://fleuret.org/git/culture}}
 196
 197 The experiments are done with a GTX 4090.
 198
 199 The GPT used has 37M parameters and the following structure:
 200
 201 \begin{center}
 202 \begin{tabular}{lc}
 203     \texttt{dim\_model}  & 512  \\
 204     \texttt{dim\_keys}   & 64   \\
 205     \texttt{dim\_hidden} & 2048 \\
 206     \texttt{nb\_heads}   & 8    \\
 207     \texttt{nb\_blocks}  & 12
 208 \end{tabular}
 209 \end{center}
 210
 211 Adam, $\eta = 1e-4$, no scheduling.
 212
 213 There are $N_{\text{train}}=250'000$ original quizzes for training and
 214 $N_{\text{test}} = 10'000$ for test.
 215
 216 At each epoch, for both train and test samples, we mix original
 217 quizzes and the generated ones.
 218
 219 For training for instance, if there are less than $N_{\text{train}}/2$
 220 new quizzes, we take all of them, otherwise we sample
 221 $N_{\text{train}}/2$ of them without replacement, and then we sample
 222 without replacement enough original quizzes to get $N_{\text{train}}$
 223 samples in total.
 224
 225 We proceed similarly to get $N_{\text{test}}$ samples for test.
 226
 227 \end{document}