The motivation behind this experiment is to check if/how statistical models can generate patterns beyond those appearing in their training set, but still consistent. We define a "quiz" to be a series of four images representing a functional transformation f. The four images are of the form A, f(A), B, f(B), where both A, B and f are random, albeit very structured. "Solving" a quizz means that one predicts properly the four images, given the three others, that is f(A), B, f(B) -> B A, B, f(B) -> f(B) A, f(A), f(B) -> B A, f(A), B -> f(B) The image culture_original.png shows examples from the initial quizzes, which have been generated with hand-designed algorithmic procedures. As it can be seen in these examples, they are very structured: There are always three rectangles, non-overlaping on A and B (and generally non-overlapping on f(A) and f(B)), and the rectangle colors are the same in the four images. From these samples, we train 32 models which are both masked predictors and generative models. They are 133M parameter attention-based deep models. They easily reach 95%+ accuracy according to the definition of "solving" above. When these 32 models get above 90% of accuracy, we use them to generate news quizzes, and we for every new quiz they generate, we take 12 models out of 32 at random, and count of many of them can solve this new quizz. If 4 models or more solve it perfectly and 4 or more fail severely (that is are more than 5 pixels off) we record this quiz. When we got 50k+ such new quizzes, we delete the models and retrain them from scratch with batches composed for 50% from the original distribution and 50% from the generated "culture". And we repeat this procedure. The images culture_valid_N.png show examples of the recorded generated quizzes.