report/culture.tex

   1 %% -*- mode: latex; mode: reftex; mode: flyspell; coding: utf-8; tex-command: "pdflatex.sh" -*-
   2
   3 %% Any copyright is dedicated to the Public Domain.
   4 %% https://creativecommons.org/publicdomain/zero/1.0/
   5 %% Written by Francois Fleuret <francois@fleuret.org>
   6
   7 \documentclass[11pt,a4paper,oneside]{article}
   8 \usepackage[paperheight=15cm,paperwidth=8cm,top=2mm,bottom=15mm,right=5mm,left=5mm]{geometry}
   9 %\usepackage[a4paper,top=2.5cm,bottom=2cm,left=2.5cm,right=2.5cm]{geometry}
  10 \usepackage[utf8]{inputenc}
  11 \usepackage{amsmath,amssymb,dsfont}
  12 \usepackage[pdftex]{graphicx}
  13 \usepackage[colorlinks=true,linkcolor=blue,urlcolor=blue,citecolor=blue]{hyperref}
  14 \urlstyle{same}
  15 \usepackage{tikz}
  16 \usetikzlibrary{arrows,arrows.meta,calc}
  17 \usetikzlibrary{patterns,backgrounds}
  18 \usetikzlibrary{positioning,fit}
  19 \usetikzlibrary{shapes.geometric,shapes.multipart}
  20 \usetikzlibrary{patterns.meta,decorations.pathreplacing,calligraphy}
  21 \usetikzlibrary{tikzmark}
  22 \usetikzlibrary{decorations.pathmorphing}
  23 \usepackage[round]{natbib}
  24 \usepackage[osf]{libertine}
  25 \usepackage{microtype}
  26
  27 \usepackage{mleftright}
  28
  29 \usepackage{enumitem}
  30 \setlist[itemize]{leftmargin=0pt,itemindent=1em,itemsep=2ex}
  31 \setlist{nosep} % or \setlist{noitemsep} to leave space around whole list
  32
  33 \newcommand{\setmuskip}[2]{#1=#2\relax}
  34 \setmuskip{\thinmuskip}{1.5mu} % by default it is equal to 3 mu
  35 \setmuskip{\medmuskip}{2mu} % by default it is equal to 4 mu
  36 \setmuskip{\thickmuskip}{3.5mu} % by default it is equal to 5 mu
  37
  38 \setlength{\parindent}{0cm}
  39 \setlength{\parskip}{1ex}
  40 %\renewcommand{\baselinestretch}{1.3}
  41 %\setlength{\tabcolsep}{0pt}
  42 %\renewcommand{\arraystretch}{1.0}
  43
  44 \def\argmax{\operatornamewithlimits{argmax}}
  45 \def\argmin{\operatornamewithlimits{argmin}}
  46
  47 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  48
  49 \def\given{\,\middle\vert\,}
  50 \def\proba{\operatorname{P}}
  51 \newcommand{\seq}{{S}}
  52 \newcommand{\expect}{\mathds{E}}
  53 \newcommand{\variance}{\mathds{V}}
  54 \newcommand{\empexpect}{\hat{\mathds{E}}}
  55 \newcommand{\mutinf}{\mathds{I}}
  56 \newcommand{\empmutinf}{\hat{\mathds{I}}}
  57 \newcommand{\entropy}{\mathds{H}}
  58 \newcommand{\empentropy}{\hat{\mathds{H}}}
  59 \newcommand{\ganG}{\mathbf{G}}
  60 \newcommand{\ganD}{\mathbf{D}}
  61 \newcommand{\ganF}{\mathbf{F}}
  62
  63 \newcommand{\dkl}{\mathds{D}_{\mathsf{KL}}}
  64 \newcommand{\djs}{\mathds{D}_{\mathsf{JS}}}
  65
  66 \newcommand*{\vertbar}{\rule[-1ex]{0.5pt}{2.5ex}}
  67 \newcommand*{\horzbar}{\rule[.5ex]{2.5ex}{0.5pt}}
  68
  69 \def\positionalencoding{\operatorname{pos-enc}}
  70 \def\concat{\operatorname{concat}}
  71 \def\crossentropy{\LL_{\operatorname{ce}}}
  72
  73 \newcommand{\separator}{\begin{center}
  74 *
  75 \end{center}}
  76
  77 \newcommand{\pic}[2]{%
  78 \hspace*{\stretch{1}}
  79 %
  80 \includegraphics[scale=0.25]{#1}
  81 %
  82 \hspace*{\stretch{1}}%
  83 }
  84
  85 \newcommand{\birdpic}[2]{%
  86 \hspace*{\stretch{1}}
  87 %
  88 \includegraphics[scale=0.35]{#1}
  89 %
  90 \hspace*{\stretch{1}}%
  91 }
  92
  93 \newenvironment{example}{%
  94
  95 \vspace*{2ex}
  96
  97 \begin{minipage}{\textwidth}
  98
  99 \setlength{\parindent}{0cm}
 100 \setlength{\parskip}{1ex}
 101 }{%
 102 \end{minipage}
 103 }
 104
 105 \begin{document}
 106
 107 \vspace*{-3ex}
 108
 109 \begin{center}
 110
 111 {\Large Self-Generated Culture}
 112
 113 Fran\c cois Fleuret
 114
 115 \today
 116
 117 \vspace*{2ex}
 118
 119 \centerline{\color{red}(work in progress, to be updated)}
 120
 121 \medskip
 122
 123 \centerline{\url{https://fleuret.org/public/culture/culture.pdf}}
 124
 125 \end{center}
 126
 127 \section{Introduction}
 128
 129 The hypothesis behind this experiment is that high-level abstract
 130 thinking is fueled by social competition.
 131
 132 A group of communicating agents that try to demonstrate their
 133 cognitive superiority would end up developing a rich and consistent
 134 culture.
 135
 136 \subsection{Setup}
 137
 138 The experiment is designed with a group of GPTs that alternatively
 139 learn to solve quizzes and generate new ones.
 140
 141 A ``quiz'' is a pair composed of a prompt and a solution, both being
 142 sequence of tokens.
 143
 144 We differentiate \textbf{world quizzes} that follow pre-defined and
 145 fixed regularities, and mimic the world's physical and environmental
 146 patterns that an organism has to grasp to survive, and \textbf{culture
 147   quizzes} that are generated by the GPTs, and mimic the knowledge one
 148 has to master to perform socially.
 149
 150
 151 We train five GPTs on a a very large set of ``world quizzes''
 152 generated randomly. These models are trained to generate both the
 153 solution given the prompt, and the prompt given the solution.
 154
 155 This is achieved by using for training both ``forward sequences'',
 156 composed of a token \texttt{[fwd]}, followed by the prompt's tokens,
 157 followed by another token \texttt{[fwd]}, followed by the solution's
 158 tokens, or ``backward sequences'' composed of a token \texttt{[bck]},
 159 followed by the solution's tokens, followed by another token
 160 \texttt{[bck]}, followed by the prompt's tokens,
 161
 162 \subsection{Generating Culture Quizzes}
 163
 164 When their accuracy get above $95\%$ we generate new quizzes as follows:
 165 %
 166 \begin{enumerate}
 167
 168 \item generate a solution (without conditioning) at temperature $T=2$,
 169   then generate a prompt for that solution at temperature $T=1/2$, and
 170   then generate a solution for that prompt at temperature $T=1/2$.
 171
 172 \item generate one solution for that prompt with each of the $5$ GPTs
 173   at temperature $T=1$, if $4$ of them generate the correct solution,
 174   validate that quiz and include it in the training data.
 175
 176 \end{enumerate}
 177
 178 This criterion assures that the new quizzes are both solvable and
 179 sophisticated, and incrementally complexify the culture. Imposing both
 180 direction prevents the generation of quizzes which are not trivial
 181 only because the prompt has been randomly degraded.
 182
 183 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 184 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 185 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 186 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 187
 188 \pagebreak
 189
 190 \section{Grid Quizzes}
 191
 192 \subsection{World Quizzes}
 193
 194 We define several types of quizzes and implement algorithmic
 195 procedures to generate randomly as many examples from each that we
 196 need.
 197
 198 In these quizzes, the prompt is made of three grids $A, f(A), B$ and
 199 the solution is a single grid $f(B)$.
 200
 201 \subsubsection{Half Fill}
 202
 203 \pic{pics/task_color_grow.png}{``half fill''}
 204
 205 The first grid contains three rectangles, each with a vertical or an
 206 horizontal line of another color in its middle. The second grid is
 207 identical with one of the rectangle having one half filled. The third
 208 grid contains three rectangles of identical colors as the firs grid,
 209 of different size and locations. The solution is obtained by filling
 210 similarly one of the half of a rectangle of the third image.
 211
 212 \subsubsection{Detect}
 213
 214 \pic{pics/task_detect.png}{``detect''}
 215
 216 The first grid contains three rectangles, the second has two pixels of
 217 same colors located in the top-left corner of two of them. The
 218 solution is obtained by marking in the fourth image the top-left
 219 corners of the rectangles of same colors in the third.
 220
 221 \subsubsection{Frame}
 222
 223 \pic{pics/task_frame.png}{``frame''}
 224
 225 The first grid contains three rectangles, and the second is identical
 226 except that one rectangle has been replaced by its frame. The same
 227 should be done to the similarly colored rectangles of the third grid
 228 to obtain the solution.
 229
 230 \subsubsection{Grow}
 231
 232 \pic{pics/task_grow.png}{``grow''}
 233
 234 The first grid contains three rectangles, one of them getting one
 235 pixel thicker or thinner in the second. The same should be done to the
 236 similarly colored rectangles of the third grid to get the solution.
 237
 238 \subsubsection{Replace color}
 239
 240 \pic{pics/task_replace_color.png}{``replace color''}
 241
 242 The first grid contains three rectangles, the second is obtained by
 243 changing one of the colors. The same should be done to the third grid
 244 to obtain the solution.
 245
 246 \subsubsection{Translate}
 247
 248 \pic{pics/task_translate.png}{``translate''}
 249
 250 The first grid contains three rectangles. The second is obtained by
 251 displacing one of them by one pixel in both direction. The solution is
 252 obtained by applying the same motion to the similarly colored
 253 rectangle in the third grid.
 254
 255 %% \subsubsection{Bounce}
 256
 257 %% \pic{pics/task_bounce.png}{``bounce''}
 258
 259 %% The solution should join the two pixels of same color, with a path of
 260 %% another color, starting in the direction indicated by a pixel of that
 261 %% color, and changing direction only when colliding with a pixel of a
 262 %% third color or one of the lattice border.
 263
 264 %% \subsubsection{count}
 265
 266 %% \pic{pics/task_count.png}{``count''}
 267
 268 %% \subsubsection{scale}
 269
 270 %% \pic{pics/task_scale.png}{``scale''}
 271
 272 %% \subsubsection{trajectory}
 273
 274 %% \pic{pics/task_trajectory.png}{``trajectory''}
 275
 276 \subsection{Culture Quizzes}
 277
 278 We list here some generated quizzes that exhibit features that were not present in the ``world quizzes'' used for training.
 279
 280 \bigskip
 281
 282 \begin{example}
 283
 284 \pic{pics/culture_c_quiz_0110_N4_validated/quiz_63.png}{0110/63}
 285
 286 \pic{pics/culture_c_quiz_0115_N4_validated/quiz_37.png}{0115/37}
 287
 288 The quizzes ``frame'' and ``half fill'' have been combined in a single
 289 quiz.
 290
 291 \end{example}
 292
 293 \separator
 294
 295 \begin{example}
 296
 297 \pic{pics/culture_c_quiz_0120_N4_validated/quiz_05.png}{0110/05}
 298
 299 The ``frame'' quiz has been generalized to non-rectangular shapes.
 300
 301 \end{example}
 302
 303 \separator
 304
 305 \begin{example}
 306
 307 \pic{pics/culture_c_quiz_0078_N4_validated/quiz_01.png}{0078/01}
 308
 309 \pic{pics/culture_c_quiz_0078_N4_validated/quiz_02.png}{0078/02}
 310
 311 More rectangles were added as distractors.
 312
 313 \end{example}
 314
 315 \separator
 316
 317 \begin{example}
 318
 319 \pic{pics/culture_c_quiz_0087_N4_validated/quiz_62.png}{0087/62}
 320
 321 \pic{pics/culture_c_quiz_0102_N4_validated/quiz_04.png}{0102/04}
 322
 323 \pic{pics/culture_c_quiz_0102_N4_validated/quiz_11.png}{0102/11}
 324
 325 \pic{pics/culture_c_quiz_0108_N4_validated/quiz_31.png}{0108/31}
 326
 327 Variation of ``Detect'' with location markers colored according to the
 328 color of the rectangle they mark.
 329
 330 \end{example}
 331
 332 \separator
 333
 334 \begin{example}
 335
 336 \pic{pics/culture_c_quiz_0078_N4_validated/quiz_16.png}{0078/16}
 337
 338 \pic{pics/culture_c_quiz_0084_N4_validated/quiz_21.png}{0084/21}
 339
 340 \pic{pics/culture_c_quiz_0078_N4_validated/quiz_42.png}{0078/42}
 341
 342 \pic{pics/culture_c_quiz_0089_N4_validated/quiz_28.png}{0089/28}
 343
 344 \pic{pics/culture_c_quiz_0084_N4_validated/quiz_00.png}{0084/00}
 345
 346 Variations of ``Half Fill'', ``Detect'', ``Translate'', ``Grow'', and
 347 ``Frame'' with a number of rectangles not equal to three.
 348
 349 \end{example}
 350
 351 \separator
 352
 353 \begin{example}
 354
 355 \pic{pics/culture_c_quiz_0078_N4_validated/quiz_27.png}{0078/27}
 356
 357 \pic{pics/culture_c_quiz_0078_N4_validated/quiz_18.png}{0078/18}
 358
 359 \pic{pics/culture_c_quiz_0086_N4_validated/quiz_45.png}{0086/45}
 360
 361 \pic{pics/culture_c_quiz_0078_N4_validated/quiz_37.png}{0078/37}
 362
 363 Variations of ``Half Fill'' where the shapes to change have more
 364 complex coloring.
 365
 366 \end{example}
 367
 368 \separator
 369
 370 \begin{example}
 371
 372 \pic{pics/culture_c_quiz_0078_N4_validated/quiz_30.png}{0078/30}
 373
 374 Variation of ``Translate'' where the moving part is occluded, which
 375 was never the case.
 376
 377 \end{example}
 378
 379 \separator
 380
 381 \begin{example}
 382
 383 \pic{pics/culture_c_quiz_0078_N4_validated/quiz_31.png}{0078/31}
 384
 385 \pic{pics/culture_c_quiz_0084_N4_validated/quiz_10.png}{0084/10}
 386
 387 \pic{pics/culture_c_quiz_0084_N4_validated/quiz_12.png}{0084/12}
 388
 389 \pic{pics/culture_c_quiz_0086_N4_validated/quiz_23.png}{0086/23}
 390
 391 \pic{pics/culture_c_quiz_0086_N4_validated/quiz_28.png}{0086/28}
 392
 393 Variations of ``Half Fill'' with non-rectangular shapes.
 394
 395 \end{example}
 396
 397 \separator
 398
 399 \begin{example}
 400
 401 \pic{pics/culture_c_quiz_0078_N4_validated/quiz_60.png}{0078/60}
 402
 403 \pic{pics/culture_c_quiz_0084_N4_validated/quiz_41.png}{0084/41}
 404
 405 \pic{pics/culture_c_quiz_0084_N4_validated/quiz_49.png}{0084/49}
 406
 407 \pic{pics/culture_c_quiz_0086_N4_validated/quiz_04.png}{0086/04}
 408
 409 Variations of ``Half Fill'' with two colors or two rectangles have to
 410 be modified.
 411
 412 \end{example}
 413
 414 \separator
 415
 416 \begin{example}
 417
 418 \pic{pics/culture_c_quiz_0111_N4_validated/quiz_23.png}{0111/23}
 419
 420 Variation of ``Frame'' with no rectangle of adequate size to be
 421 modified.
 422
 423 \end{example}
 424
 425 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 426 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 427 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 428 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 429
 430 \pagebreak
 431
 432 \section{Bird World}
 433
 434 These results were obtained with a slightly different procedure. In
 435 particular the quizzes were validated if the models could predict both
 436 the solution from the prompt and the prompt from the solution. We
 437 report them since they exhibit the same patterns of generalization
 438 although they are quite different.
 439
 440 \subsection{World Quizzes}
 441
 442 The initial set of quizzes consist of predicting the dynamics of a
 443 very simple world: A $6 \times 8$ grid with three colored ``birds'' moving in
 444 a straight line, possibly bouncing on the grid's borders. There are
 445 ten different colors.
 446 %
 447 \birdpic{pics/examples_train.png}{}
 448 %
 449
 450 In each on these quizzes, $A$ is the left image serialized in
 451 raster-scan order as a sequence of $6 \times 8 = 48$ tokens, $d$ is
 452 either the token ``forward'' or the token ``backward'', and $B$ is the
 453 right image, also serialized. The direction of prediction is chosen at
 454 random.
 455
 456 \subsection{Culture quizzes}
 457
 458 This procedure results in the discovery of patterns which are not
 459 present in the original quizzes:
 460
 461 \begin{example}
 462
 463 \birdpic{pics/4_birds_1.png}{}
 464
 465 \birdpic{pics/5_birds_1.png}{}
 466
 467 \birdpic{pics/6_birds_1.png}{}
 468
 469 More birds.
 470
 471 \end{example}
 472
 473 \separator
 474
 475 \begin{example}
 476
 477 \birdpic{pics/other_shapes_2.png}{}
 478
 479 \birdpic{pics/other_shapes_3.png}{}
 480
 481 New bird shapes.
 482
 483 \end{example}
 484
 485 \separator
 486
 487 \begin{example}
 488
 489 \birdpic{pics/other_shapes_1.png}{}
 490
 491 \birdpic{pics/occlusions_1.png}{}
 492
 493 Occlusions.
 494
 495 \end{example}
 496
 497 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 498 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 499 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 500 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 501
 502 \pagebreak
 503
 504 \section{Various thoughts}
 505
 506 \begin{itemize}
 507
 508 \item The whole process can be envisioned as natural selection of
 509   quizzes in the representation landscape of GPTs. There probably is a
 510   subtle relation between the temperature (mutation rate) and the
 511   number of models used to validate with the ``all but one'' criterion
 512   (survival criterion).
 513
 514 \item The ``all but one'' could be ``all but K'', and there may be
 515   some information-theoretical thing, where the goal is to maximize
 516   mutual information, with $K=N$ being total randomness, so high
 517   entropy but no structure, and $K=0$ is total determinism, so no
 518   information to share.
 519
 520 \item The setup does not push toward any specific invariance or
 521   property in the generated quizzes, their consistency is entirely due
 522   to the statistics of the ``world quizzes'' that remain in the
 523   training set, and to the GPTs' inductive biased.
 524
 525 \item The GPTs obviously get a sense of objectness and 2d topology
 526   early on, since they rapidly increase the number of birds and
 527   ``discover'' occlusion even though they never was in the world
 528   quizzes.
 529
 530 \item There may not be so many problems that can be cast as pairs of
 531   patterns that are each a deterministic function of the other, which
 532   is probably critical here.
 533
 534 \item This overall process probably fight the ``simplicity bias'': If
 535   a model is lacking a ``cue'' that the others have, there will
 536   rapidly be quizzes that require this cue, they will be added to the
 537   training data, and that model will catch up.
 538
 539 \item The randomness of the process probably allow to even go beyond
 540   just synchronizing the abilities of the models. There may be some
 541   additional complexification of quizzes that get accepted by chance.
 542
 543 \item It can be parallelized by dispatching the GPTs across multiples
 544   nodes, and avoiding a quadratic cost by limiting the validation of
 545   the quizzes to a subset of them.
 546
 547 \item The current process to generate new quizzes, which simply
 548   samples them at random is very rudimentary and probably not
 549   sufficient in a real-data setup. It can probably be supplemented
 550   with a MCTS-type search.
 551
 552 \item There may be already in the generated quizzes some structure
 553   that \emph{we} do not pick up (e.g. certain color or motion
 554   patterns).
 555
 556 \end{itemize}
 557
 558 \section*{Appendix}
 559
 560 The code is available at
 561
 562 \medskip
 563
 564 \centerline{\url{https://fleuret.org/git/culture}}
 565
 566 The experiments are done with a GTX 4090.
 567
 568 The GPT used has 37M parameters and the following structure:
 569
 570 \begin{center}
 571 \begin{tabular}{lc}
 572     \texttt{dim\_model}  & 512  \\
 573     \texttt{dim\_keys}   & 64   \\
 574     \texttt{dim\_hidden} & 2048 \\
 575     \texttt{nb\_heads}   & 8    \\
 576     \texttt{nb\_blocks}  & 12
 577 \end{tabular}
 578 \end{center}
 579
 580 Adam, $\eta = 1e-4$, no scheduling.
 581
 582 There are $N_{\text{train}}=250'000$ original quizzes for training and
 583 $N_{\text{test}} = 10'000$ for test.
 584
 585 At each epoch, for both train and test samples, we mix original
 586 quizzes and the generated ones.
 587
 588 For training for instance, if there are less than $N_{\text{train}}/2$
 589 new quizzes, we take all of them, otherwise we sample
 590 $N_{\text{train}}/2$ of them without replacement, and then we sample
 591 without replacement enough original quizzes to get $N_{\text{train}}$
 592 samples in total.
 593
 594 We proceed similarly to get $N_{\text{test}}$ samples for test.
 595
 596 \end{document}