20240713-07:41:35 argv ./main.py --accuracy_to_make_c_quizzes=0.9 --proba_understands=0.125 --proba_not_understands=1e-5 --result_dir=./results_grids_v4b 20240713-07:41:35 args.log_filename train.log 20240713-07:41:35 args.result_dir ./results_grids_v4b 20240713-07:41:35 args.seed 0 20240713-07:41:35 args.resume False 20240713-07:41:35 args.max_percents_of_test_in_train -1 20240713-07:41:35 args.nb_epochs 10000 20240713-07:41:35 args.batch_size 25 20240713-07:41:35 args.physical_batch_size None 20240713-07:41:35 args.nb_train_samples 100000 20240713-07:41:35 args.nb_test_samples 10000 20240713-07:41:35 args.learning_rate 0.0005 20240713-07:41:35 args.model 37M 20240713-07:41:35 args.dim_model 512 20240713-07:41:35 args.dim_keys 64 20240713-07:41:35 args.dim_hidden 2048 20240713-07:41:35 args.nb_heads 8 20240713-07:41:35 args.nb_blocks 12 20240713-07:41:35 args.dropout 0.1 20240713-07:41:35 args.deterministic_synthesis False 20240713-07:41:35 args.problem grids 20240713-07:41:35 args.nb_threads 1 20240713-07:41:35 args.gpus all 20240713-07:41:35 args.nb_gpts 5 20240713-07:41:35 args.accuracy_to_make_c_quizzes 0.9 20240713-07:41:35 args.proba_understands 0.125 20240713-07:41:35 args.proba_not_understands 1e-05 20240713-07:41:35 args.generation_temperature 2.0 20240713-07:41:35 args.dirty_debug False 20240713-07:41:35 args.grids_tasks None 20240713-07:41:35 args.sky_height 6 20240713-07:41:35 args.sky_width 8 20240713-07:41:35 args.sky_nb_birds 3 20240713-07:41:35 args.sky_nb_iterations 2 20240713-07:41:35 args.sky_speed 3 20240713-07:41:48 main_device cuda:0 gpus ['cuda:0', 'cuda:1'] 20240713-07:41:48 vocabulary_size 13 20240713-07:41:48 creating model 0 and its w_quizzes 20240713-07:44:38 creating model 1 and its w_quizzes 20240713-07:47:35 creating model 2 and its w_quizzes 20240713-07:50:30 creating model 3 and its w_quizzes 20240713-07:53:23 creating model 4 and its w_quizzes 20240713-07:56:17 nb_parameters 37817357 (37M) 20240713-07:56:18 nb_new_c_quizzes_for_train 2000 nb_new_c_quizzes_for_test 200 20240713-07:56:18 --- epoch 0 ---------------------------------------- 20240713-07:56:18 current_test_accuracies 0.0000 0.0000 0.0000 0.0000 0.0000 20240713-07:56:18 training model 0 20240713-07:56:18 training model 1 20240713-08:05:21 train_perplexity 0 model 0 1.6462528675279366 20240713-08:05:34 train_perplexity 0 model 1 1.6388221384828212 20240713-08:05:45 test_perplexity 0 model 0 1.2430069102197505 20240713-08:05:59 test_perplexity 0 model 1 1.2255297077914549 20240713-08:08:51 test_accuracy 0 model 0 forward 6 / 504 backward 3 / 496 20240713-08:08:51 main_test_accuracy 0 0.009000000543892384 20240713-08:08:54 test_accuracy 0 model 1 forward 19 / 495 backward 10 / 505 20240713-08:08:54 main_test_accuracy 0 0.029000001028180122 20240713-08:08:55 wrote gpt_000.pth 20240713-08:08:55 wrote gpt_001.pth 20240713-08:08:55 cache_w_quizzes contains 200000 quizzes 20240713-08:09:04 --- epoch 1 ---------------------------------------- 20240713-08:09:04 current_test_accuracies 0.0090 0.0290 0.0000 0.0000 0.0000 20240713-08:09:04 training model 2 20240713-08:09:04 training model 3 20240713-08:18:06 train_perplexity 1 model 2 1.677983760650814 20240713-08:18:19 train_perplexity 1 model 3 1.5914548877755341 20240713-08:18:29 test_perplexity 1 model 2 1.2422929127783748 20240713-08:18:44 test_perplexity 1 model 3 1.237214548532431 20240713-08:21:38 test_accuracy 1 model 2 forward 7 / 508 backward 0 / 492 20240713-08:21:38 main_test_accuracy 1 0.007000000216066837 20240713-08:21:41 test_accuracy 1 model 3 forward 8 / 506 backward 1 / 494 20240713-08:21:41 main_test_accuracy 1 0.009000000543892384 20240713-08:21:42 wrote gpt_002.pth 20240713-08:21:42 wrote gpt_003.pth 20240713-08:21:42 cache_w_quizzes contains 200000 quizzes 20240713-08:21:51 --- epoch 2 ---------------------------------------- 20240713-08:21:51 current_test_accuracies 0.0090 0.0290 0.0070 0.0090 0.0000 20240713-08:21:51 training model 4 20240713-08:21:51 training model 2 20240713-08:30:54 train_perplexity 2 model 4 1.645475242711973 20240713-08:31:06 train_perplexity 2 model 2 1.2235242527628463 20240713-08:31:19 test_perplexity 2 model 4 1.242327848096534 20240713-08:31:32 test_perplexity 2 model 2 1.180515794054294 20240713-08:34:27 test_accuracy 2 model 4 forward 8 / 478 backward 1 / 522 20240713-08:34:27 main_test_accuracy 2 0.009000000543892384 20240713-08:34:29 test_accuracy 2 model 2 forward 81 / 508 backward 42 / 492 20240713-08:34:29 main_test_accuracy 2 0.12300000339746475 20240713-08:34:30 wrote gpt_004.pth 20240713-08:34:31 wrote gpt_002.pth 20240713-08:34:31 cache_w_quizzes contains 200000 quizzes 20240713-08:34:40 --- epoch 3 ---------------------------------------- 20240713-08:34:40 current_test_accuracies 0.0090 0.0290 0.1230 0.0090 0.0090 20240713-08:34:40 training model 0 20240713-08:34:40 training model 3 20240713-08:43:44 train_perplexity 3 model 0 1.2242092824851019 20240713-08:43:56 train_perplexity 3 model 3 1.2172129831226097 20240713-08:44:08 test_perplexity 3 model 0 1.1850612712397848 20240713-08:44:21 test_perplexity 3 model 3 1.1781453829917292 20240713-08:47:17 test_accuracy 3 model 0 forward 99 / 504 backward 37 / 496 20240713-08:47:17 main_test_accuracy 3 0.13600000739097595 20240713-08:47:20 test_accuracy 3 model 3 forward 83 / 506 backward 32 / 494 20240713-08:47:20 main_test_accuracy 3 0.11500000208616257 20240713-08:47:20 wrote gpt_000.pth 20240713-08:47:21 wrote gpt_003.pth 20240713-08:47:21 cache_w_quizzes contains 200000 quizzes 20240713-08:47:29 --- epoch 4 ---------------------------------------- 20240713-08:47:29 current_test_accuracies 0.1360 0.0290 0.1230 0.1150 0.0090 20240713-08:47:29 training model 4 20240713-08:47:29 training model 1 20240713-08:56:34 train_perplexity 4 model 4 1.2221995077872088 20240713-08:56:44 train_perplexity 4 model 1 1.209817004144225 20240713-08:57:00 test_perplexity 4 model 4 1.1852317593324047 20240713-08:57:11 test_perplexity 4 model 1 1.1758447974956365 20240713-09:00:07 test_accuracy 4 model 4 forward 86 / 478 backward 22 / 522 20240713-09:00:07 main_test_accuracy 4 0.1080000028014183 20240713-09:00:09 test_accuracy 4 model 1 forward 158 / 495 backward 66 / 505 20240713-09:00:09 main_test_accuracy 4 0.2240000069141388 20240713-09:00:10 wrote gpt_004.pth 20240713-09:00:10 wrote gpt_001.pth 20240713-09:00:10 cache_w_quizzes contains 200000 quizzes 20240713-09:00:19 --- epoch 5 ---------------------------------------- 20240713-09:00:19 current_test_accuracies 0.1360 0.2240 0.1230 0.1150 0.1080 20240713-09:00:19 training model 4 20240713-09:00:19 training model 3 20240713-09:09:24 train_perplexity 5 model 4 1.1781220503374592 20240713-09:09:34 train_perplexity 5 model 3 1.1759931469816827 20240713-09:09:51 test_perplexity 5 model 4 1.1602348141813499 20240713-09:10:02 test_perplexity 5 model 3 1.160968781360916 20240713-09:12:57 test_accuracy 5 model 4 forward 185 / 478 backward 118 / 522 20240713-09:12:57 main_test_accuracy 5 0.30300000309944153 20240713-09:12:58 test_accuracy 5 model 3 forward 227 / 506 backward 132 / 494 20240713-09:12:58 main_test_accuracy 5 0.359000027179718 20240713-09:12:59 wrote gpt_004.pth 20240713-09:13:00 wrote gpt_003.pth 20240713-09:13:00 cache_w_quizzes contains 200000 quizzes 20240713-09:13:08 --- epoch 6 ---------------------------------------- 20240713-09:13:08 current_test_accuracies 0.1360 0.2240 0.1230 0.3590 0.3030 20240713-09:13:08 training model 2 20240713-09:13:08 training model 0 20240713-09:22:13 train_perplexity 6 model 2 1.1778832526540477 20240713-09:22:23 train_perplexity 6 model 0 1.1796792849400026 20240713-09:22:39 test_perplexity 6 model 2 1.160438004654595 20240713-09:22:51 test_perplexity 6 model 0 1.1625448844497552 20240713-09:25:43 test_accuracy 6 model 2 forward 177 / 508 backward 137 / 492 20240713-09:25:43 main_test_accuracy 6 0.3140000104904175 20240713-09:25:46 test_accuracy 6 model 0 forward 223 / 504 backward 111 / 496 20240713-09:25:46 main_test_accuracy 6 0.33400002121925354 20240713-09:25:47 wrote gpt_002.pth 20240713-09:25:47 wrote gpt_000.pth 20240713-09:25:47 cache_w_quizzes contains 200000 quizzes 20240713-09:25:56 --- epoch 7 ---------------------------------------- 20240713-09:25:56 current_test_accuracies 0.3340 0.2240 0.3140 0.3590 0.3030 20240713-09:25:56 training model 1 20240713-09:25:56 training model 4 20240713-09:35:02 train_perplexity 7 model 1 1.1729552141617223 20240713-09:35:11 train_perplexity 7 model 4 1.1626286846550684 20240713-09:35:29 test_perplexity 7 model 1 1.1587398800235678 20240713-09:35:39 test_perplexity 7 model 4 1.1533109934625214 20240713-09:38:34 test_accuracy 7 model 1 forward 216 / 495 backward 128 / 505 20240713-09:38:34 main_test_accuracy 7 0.3440000116825104 20240713-09:38:36 test_accuracy 7 model 4 forward 261 / 478 backward 160 / 522 20240713-09:38:36 main_test_accuracy 7 0.42100003361701965 20240713-09:38:37 wrote gpt_001.pth 20240713-09:38:38 wrote gpt_004.pth 20240713-09:38:38 cache_w_quizzes contains 200000 quizzes 20240713-09:38:46 --- epoch 8 ---------------------------------------- 20240713-09:38:46 current_test_accuracies 0.3340 0.3440 0.3140 0.3590 0.4210 20240713-09:38:46 training model 2 20240713-09:38:46 training model 0 20240713-09:47:52 train_perplexity 8 model 2 1.1627301095671894 20240713-09:48:00 train_perplexity 8 model 0 1.1637620052905524 20240713-09:48:21 test_perplexity 8 model 2 1.1526246974699004 20240713-09:48:30 test_perplexity 8 model 0 1.1545078345185862 20240713-09:51:24 test_accuracy 8 model 2 forward 261 / 508 backward 203 / 492 20240713-09:51:24 main_test_accuracy 8 0.46400001645088196 20240713-09:51:26 test_accuracy 8 model 0 forward 288 / 504 backward 149 / 496 20240713-09:51:26 main_test_accuracy 8 0.43700000643730164 20240713-09:51:27 wrote gpt_002.pth 20240713-09:51:27 wrote gpt_000.pth 20240713-09:51:27 cache_w_quizzes contains 200000 quizzes 20240713-09:51:36 --- epoch 9 ---------------------------------------- 20240713-09:51:36 current_test_accuracies 0.4370 0.3440 0.4640 0.3590 0.4210 20240713-09:51:36 training model 1 20240713-09:51:36 training model 3 20240713-10:00:41 train_perplexity 9 model 1 1.1599629917228107 20240713-10:00:51 train_perplexity 9 model 3 1.161289412492487 20240713-10:01:07 test_perplexity 9 model 1 1.151406834062672 20240713-10:01:19 test_perplexity 9 model 3 1.1541102373453578 20240713-10:04:14 test_accuracy 9 model 1 forward 284 / 495 backward 204 / 505 20240713-10:04:14 main_test_accuracy 9 0.4880000352859497 20240713-10:04:15 test_accuracy 9 model 3 forward 287 / 506 backward 176 / 494 20240713-10:04:15 main_test_accuracy 9 0.46300002932548523 20240713-10:04:16 wrote gpt_001.pth 20240713-10:04:16 wrote gpt_003.pth 20240713-10:04:16 cache_w_quizzes contains 200000 quizzes 20240713-10:04:24 --- epoch 10 ---------------------------------------- 20240713-10:04:24 current_test_accuracies 0.4370 0.4880 0.4640 0.4630 0.4210 20240713-10:04:24 training model 4 20240713-10:04:24 training model 0 20240713-10:13:30 train_perplexity 10 model 4 1.1559692398886947 20240713-10:13:39 train_perplexity 10 model 0 1.1557805451782193 20240713-10:13:57 test_perplexity 10 model 4 1.1486205782674703 20240713-10:14:07 test_perplexity 10 model 0 1.1497577896883866 20240713-10:17:02 test_accuracy 10 model 4 forward 311 / 478 backward 236 / 522 20240713-10:17:02 main_test_accuracy 10 0.5470000505447388 20240713-10:17:03 test_accuracy 10 model 0 forward 330 / 504 backward 194 / 496 20240713-10:17:03 main_test_accuracy 10 0.5240000486373901 20240713-10:17:04 wrote gpt_004.pth 20240713-10:17:04 wrote gpt_000.pth 20240713-10:17:04 cache_w_quizzes contains 200000 quizzes 20240713-10:17:13 --- epoch 11 ---------------------------------------- 20240713-10:17:13 current_test_accuracies 0.5240 0.4880 0.4640 0.4630 0.5470 20240713-10:17:13 training model 3 20240713-10:17:13 training model 2 20240713-10:26:19 train_perplexity 11 model 3 1.1540397978717611 20240713-10:26:28 train_perplexity 11 model 2 1.15545895157283 20240713-10:26:45 test_perplexity 11 model 3 1.1496645023312058 20240713-10:26:56 test_perplexity 11 model 2 1.1481401544734084 20240713-10:29:50 test_accuracy 11 model 3 forward 364 / 506 backward 220 / 494 20240713-10:29:50 main_test_accuracy 11 0.5840000510215759 20240713-10:29:52 test_accuracy 11 model 2 forward 333 / 508 backward 220 / 492 20240713-10:29:52 main_test_accuracy 11 0.5530000329017639 20240713-10:29:53 wrote gpt_003.pth 20240713-10:29:53 wrote gpt_002.pth 20240713-10:29:53 cache_w_quizzes contains 200000 quizzes 20240713-10:30:02 --- epoch 12 ---------------------------------------- 20240713-10:30:02 current_test_accuracies 0.5240 0.4880 0.5530 0.5840 0.5470 20240713-10:30:02 training model 1 20240713-10:30:02 training model 0 20240713-10:39:08 train_perplexity 12 model 1 1.1537661804837571 20240713-10:39:17 train_perplexity 12 model 0 1.1515902690656994 20240713-10:39:35 test_perplexity 12 model 1 1.148241883785294 20240713-10:39:45 test_perplexity 12 model 0 1.1467041145816195 20240713-10:42:42 test_accuracy 12 model 1 forward 331 / 495 backward 203 / 505 20240713-10:42:42 main_test_accuracy 12 0.534000039100647 20240713-10:42:43 test_accuracy 12 model 0 forward 376 / 504 backward 220 / 496 20240713-10:42:43 main_test_accuracy 12 0.5960000157356262 20240713-10:42:44 wrote gpt_001.pth 20240713-10:42:44 wrote gpt_000.pth 20240713-10:42:44 cache_w_quizzes contains 200000 quizzes 20240713-10:42:53 --- epoch 13 ---------------------------------------- 20240713-10:42:53 current_test_accuracies 0.5960 0.5340 0.5530 0.5840 0.5470 20240713-10:42:53 training model 1 20240713-10:42:53 training model 4 20240713-10:51:58 train_perplexity 13 model 1 1.1505375517224188 20240713-10:52:07 train_perplexity 13 model 4 1.1514265232437064 20240713-10:52:25 test_perplexity 13 model 1 1.1460662204417604 20240713-10:52:35 test_perplexity 13 model 4 1.1462550635308022 20240713-10:55:31 test_accuracy 13 model 1 forward 344 / 495 backward 249 / 505 20240713-10:55:31 main_test_accuracy 13 0.593000054359436 20240713-10:55:33 test_accuracy 13 model 4 forward 342 / 478 backward 279 / 522 20240713-10:55:33 main_test_accuracy 13 0.6210000514984131 20240713-10:55:34 wrote gpt_001.pth 20240713-10:55:34 wrote gpt_004.pth 20240713-10:55:34 cache_w_quizzes contains 200000 quizzes 20240713-10:55:43 --- epoch 14 ---------------------------------------- 20240713-10:55:43 current_test_accuracies 0.5960 0.5930 0.5530 0.5840 0.6210 20240713-10:55:43 training model 2 20240713-10:55:43 training model 3 20240713-11:04:49 train_perplexity 14 model 2 1.151089076901544 20240713-11:04:57 train_perplexity 14 model 3 1.150233280361876 20240713-11:05:16 test_perplexity 14 model 2 1.1461363906829134 20240713-11:05:26 test_perplexity 14 model 3 1.1480469038908432 20240713-11:08:20 test_accuracy 14 model 2 forward 362 / 508 backward 246 / 492 20240713-11:08:20 main_test_accuracy 14 0.6080000400543213 20240713-11:08:22 test_accuracy 14 model 3 forward 359 / 506 backward 253 / 494 20240713-11:08:22 main_test_accuracy 14 0.612000048160553 20240713-11:08:23 wrote gpt_002.pth 20240713-11:08:23 wrote gpt_003.pth 20240713-11:08:23 cache_w_quizzes contains 200000 quizzes 20240713-11:08:31 --- epoch 15 ---------------------------------------- 20240713-11:08:31 current_test_accuracies 0.5960 0.5930 0.6080 0.6120 0.6210 20240713-11:08:31 training model 1 20240713-11:08:31 training model 0 20240713-11:17:37 train_perplexity 15 model 1 1.148211525449541 20240713-11:17:46 train_perplexity 15 model 0 1.1486799906310314 20240713-11:18:04 test_perplexity 15 model 1 1.1438140883205277 20240713-11:18:14 test_perplexity 15 model 0 1.1447945073217365 20240713-11:21:09 test_accuracy 15 model 1 forward 368 / 495 backward 286 / 505 20240713-11:21:09 main_test_accuracy 15 0.6540000438690186 20240713-11:21:11 test_accuracy 15 model 0 forward 390 / 504 backward 257 / 496 20240713-11:21:11 main_test_accuracy 15 0.6470000147819519 20240713-11:21:12 wrote gpt_001.pth 20240713-11:21:12 wrote gpt_000.pth 20240713-11:21:12 cache_w_quizzes contains 200000 quizzes 20240713-11:21:20 --- epoch 16 ---------------------------------------- 20240713-11:21:20 current_test_accuracies 0.6470 0.6540 0.6080 0.6120 0.6210 20240713-11:21:20 training model 2 20240713-11:21:20 training model 3 20240713-11:30:26 train_perplexity 16 model 2 1.1488183438993973 20240713-11:30:35 train_perplexity 16 model 3 1.147948151347076 20240713-11:30:52 test_perplexity 16 model 2 1.1434722043400205 20240713-11:31:03 test_perplexity 16 model 3 1.144157029193478 20240713-11:33:56 test_accuracy 16 model 2 forward 376 / 508 backward 295 / 492 20240713-11:33:56 main_test_accuracy 16 0.6710000038146973 20240713-11:33:59 test_accuracy 16 model 3 forward 393 / 506 backward 299 / 494 20240713-11:33:59 main_test_accuracy 16 0.6920000314712524 20240713-11:34:00 wrote gpt_002.pth 20240713-11:34:00 wrote gpt_003.pth 20240713-11:34:00 cache_w_quizzes contains 200000 quizzes 20240713-11:34:08 --- epoch 17 ---------------------------------------- 20240713-11:34:08 current_test_accuracies 0.6470 0.6540 0.6710 0.6920 0.6210 20240713-11:34:08 training model 4 20240713-11:34:08 training model 0 20240713-11:43:14 train_perplexity 17 model 4 1.1487983149661598 20240713-11:43:22 train_perplexity 17 model 0 1.1466836940711789 20240713-11:43:42 test_perplexity 17 model 4 1.1449757702860959 20240713-11:43:51 test_perplexity 17 model 0 1.1430345219517348 20240713-11:46:48 test_accuracy 17 model 4 forward 324 / 478 backward 246 / 522 20240713-11:46:48 main_test_accuracy 17 0.5700000524520874 20240713-11:46:49 test_accuracy 17 model 0 forward 395 / 504 backward 283 / 496 20240713-11:46:49 main_test_accuracy 17 0.6780000329017639 20240713-11:46:50 wrote gpt_004.pth 20240713-11:46:50 wrote gpt_000.pth 20240713-11:46:50 cache_w_quizzes contains 200000 quizzes 20240713-11:46:59 --- epoch 18 ---------------------------------------- 20240713-11:46:59 current_test_accuracies 0.6780 0.6540 0.6710 0.6920 0.5700 20240713-11:46:59 training model 4 20240713-11:46:59 training model 1 20240713-11:56:04 train_perplexity 18 model 4 1.1469112108222261 20240713-11:56:13 train_perplexity 18 model 1 1.1459497986001965 20240713-11:56:32 test_perplexity 18 model 4 1.142370430551431 20240713-11:56:41 test_perplexity 18 model 1 1.1423108637595063 20240713-11:59:36 test_accuracy 18 model 4 forward 375 / 478 backward 300 / 522 20240713-11:59:36 main_test_accuracy 18 0.675000011920929 20240713-11:59:38 test_accuracy 18 model 1 forward 369 / 495 backward 323 / 505 20240713-11:59:38 main_test_accuracy 18 0.6920000314712524 20240713-11:59:39 wrote gpt_004.pth 20240713-11:59:39 wrote gpt_001.pth 20240713-11:59:39 cache_w_quizzes contains 200000 quizzes 20240713-11:59:47 --- epoch 19 ---------------------------------------- 20240713-11:59:47 current_test_accuracies 0.6780 0.6920 0.6710 0.6920 0.6750 20240713-11:59:47 training model 2 20240713-11:59:47 training model 4 20240713-12:08:52 train_perplexity 19 model 2 1.1468346202724198 20240713-12:09:01 train_perplexity 19 model 4 1.1455301828393665 20240713-12:09:19 test_perplexity 19 model 2 1.143453749883792 20240713-12:09:29 test_perplexity 19 model 4 1.1409510378503795 20240713-12:12:26 test_accuracy 19 model 2 forward 367 / 508 backward 300 / 492 20240713-12:12:26 main_test_accuracy 19 0.6670000553131104 20240713-12:12:30 test_accuracy 19 model 4 forward 394 / 478 backward 313 / 522 20240713-12:12:30 main_test_accuracy 19 0.7070000171661377 20240713-12:12:31 wrote gpt_002.pth 20240713-12:12:31 wrote gpt_004.pth 20240713-12:12:31 cache_w_quizzes contains 200000 quizzes 20240713-12:12:40 --- epoch 20 ---------------------------------------- 20240713-12:12:40 current_test_accuracies 0.6780 0.6920 0.6670 0.6920 0.7070 20240713-12:12:40 training model 2 20240713-12:12:40 training model 0 20240713-12:21:46 train_perplexity 20 model 2 1.1452435815912703 20240713-12:21:55 train_perplexity 20 model 0 1.144708018311324 20240713-12:22:14 test_perplexity 20 model 2 1.1421634188478202 20240713-12:22:24 test_perplexity 20 model 0 1.1419094010868334 20240713-12:25:18 test_accuracy 20 model 2 forward 391 / 508 backward 290 / 492 20240713-12:25:18 main_test_accuracy 20 0.6810000538825989 20240713-12:25:20 test_accuracy 20 model 0 forward 427 / 504 backward 295 / 496 20240713-12:25:20 main_test_accuracy 20 0.7220000624656677 20240713-12:25:21 wrote gpt_002.pth 20240713-12:25:21 wrote gpt_000.pth 20240713-12:25:21 cache_w_quizzes contains 200000 quizzes 20240713-12:25:30 --- epoch 21 ---------------------------------------- 20240713-12:25:30 current_test_accuracies 0.7220 0.6920 0.6810 0.6920 0.7070 20240713-12:25:30 training model 2 20240713-12:25:30 training model 1 20240713-12:34:35 train_perplexity 21 model 2 1.1439390619324277 20240713-12:34:44 train_perplexity 21 model 1 1.1453056638627894 20240713-12:35:03 test_perplexity 21 model 2 1.1409673113514738 20240713-12:35:13 test_perplexity 21 model 1 1.1413383967192048 20240713-12:38:07 test_accuracy 21 model 2 forward 387 / 508 backward 299 / 492 20240713-12:38:07 main_test_accuracy 21 0.6860000491142273 20240713-12:38:10 test_accuracy 21 model 1 forward 383 / 495 backward 324 / 505 20240713-12:38:10 main_test_accuracy 21 0.7070000171661377 20240713-12:38:11 wrote gpt_002.pth 20240713-12:38:11 wrote gpt_001.pth 20240713-12:38:11 cache_w_quizzes contains 200000 quizzes 20240713-12:38:20 --- epoch 22 ---------------------------------------- 20240713-12:38:20 current_test_accuracies 0.7220 0.7070 0.6860 0.6920 0.7070 20240713-12:38:20 training model 2 20240713-12:38:20 training model 3 20240713-12:47:25 train_perplexity 22 model 2 1.1435687596932858 20240713-12:47:34 train_perplexity 22 model 3 1.146345028634026 20240713-12:47:53 test_perplexity 22 model 2 1.14082093444022 20240713-12:48:03 test_perplexity 22 model 3 1.1431175636332869 20240713-12:50:57 test_accuracy 22 model 2 forward 407 / 508 backward 329 / 492 20240713-12:50:57 main_test_accuracy 22 0.7360000610351562 20240713-12:50:59 test_accuracy 22 model 3 forward 408 / 506 backward 294 / 494 20240713-12:50:59 main_test_accuracy 22 0.7020000219345093 20240713-12:51:00 wrote gpt_002.pth 20240713-12:51:00 wrote gpt_003.pth 20240713-12:51:00 cache_w_quizzes contains 200000 quizzes 20240713-12:51:08 --- epoch 23 ---------------------------------------- 20240713-12:51:08 current_test_accuracies 0.7220 0.7070 0.7360 0.7020 0.7070 20240713-12:51:08 training model 3 20240713-12:51:08 training model 1 20240713-13:00:13 train_perplexity 23 model 3 1.144755292742151 20240713-13:00:21 train_perplexity 23 model 1 1.1437304324350344 20240713-13:00:42 test_perplexity 23 model 3 1.1417935009262759 20240713-13:00:51 test_perplexity 23 model 1 1.1403744323529301 20240713-13:03:45 test_accuracy 23 model 3 forward 414 / 506 backward 312 / 494 20240713-13:03:45 main_test_accuracy 23 0.7260000109672546 20240713-13:03:48 test_accuracy 23 model 1 forward 392 / 495 backward 332 / 505 20240713-13:03:48 main_test_accuracy 23 0.7240000367164612 20240713-13:03:48 wrote gpt_003.pth 20240713-13:03:49 wrote gpt_001.pth 20240713-13:03:49 cache_w_quizzes contains 200000 quizzes 20240713-13:03:58 --- epoch 24 ---------------------------------------- 20240713-13:03:58 current_test_accuracies 0.7220 0.7240 0.7360 0.7260 0.7070 20240713-13:03:58 training model 4 20240713-13:03:58 training model 0 20240713-13:13:03 train_perplexity 24 model 4 1.1441645604192658 20240713-13:13:12 train_perplexity 24 model 0 1.143846081856472 20240713-13:13:31 test_perplexity 24 model 4 1.1403489196633296 20240713-13:13:41 test_perplexity 24 model 0 1.1409185215300424 20240713-13:16:37 test_accuracy 24 model 4 forward 381 / 478 backward 345 / 522 20240713-13:16:37 main_test_accuracy 24 0.7260000109672546 20240713-13:16:38 test_accuracy 24 model 0 forward 421 / 504 backward 304 / 496 20240713-13:16:38 main_test_accuracy 24 0.7250000238418579 20240713-13:16:39 wrote gpt_004.pth 20240713-13:16:40 wrote gpt_000.pth 20240713-13:16:40 cache_w_quizzes contains 200000 quizzes 20240713-13:16:48 --- epoch 25 ---------------------------------------- 20240713-13:16:48 current_test_accuracies 0.7250 0.7240 0.7360 0.7260 0.7260 20240713-13:16:48 training model 1 20240713-13:16:48 training model 0 20240713-13:25:53 train_perplexity 25 model 1 1.1431325371513585 20240713-13:26:03 train_perplexity 25 model 0 1.1428592312270802 20240713-13:26:19 test_perplexity 25 model 1 1.140111005172835 20240713-13:26:30 test_perplexity 25 model 0 1.1394718445673697 20240713-13:29:26 test_accuracy 25 model 1 forward 386 / 495 backward 354 / 505 20240713-13:29:26 main_test_accuracy 25 0.7400000095367432 20240713-13:29:28 test_accuracy 25 model 0 forward 437 / 504 backward 334 / 496 20240713-13:29:28 main_test_accuracy 25 0.7710000276565552 20240713-13:29:28 wrote gpt_001.pth 20240713-13:29:29 wrote gpt_000.pth 20240713-13:29:29 cache_w_quizzes contains 200000 quizzes 20240713-13:29:37 --- epoch 26 ---------------------------------------- 20240713-13:29:37 current_test_accuracies 0.7710 0.7400 0.7360 0.7260 0.7260 20240713-13:29:37 training model 3 20240713-13:29:37 training model 4 20240713-13:38:42 train_perplexity 26 model 3 1.1431203326179584 20240713-13:38:51 train_perplexity 26 model 4 1.1424326064788504 20240713-13:39:10 test_perplexity 26 model 3 1.1423341546135195 20240713-13:39:19 test_perplexity 26 model 4 1.1386041673388587 20240713-13:42:15 test_accuracy 26 model 3 forward 397 / 506 backward 325 / 494 20240713-13:42:15 main_test_accuracy 26 0.7220000624656677 20240713-13:42:18 test_accuracy 26 model 4 forward 408 / 478 backward 357 / 522 20240713-13:42:18 main_test_accuracy 26 0.76500004529953 20240713-13:42:19 wrote gpt_003.pth 20240713-13:42:19 wrote gpt_004.pth 20240713-13:42:19 cache_w_quizzes contains 200000 quizzes 20240713-13:42:28 --- epoch 27 ---------------------------------------- 20240713-13:42:28 current_test_accuracies 0.7710 0.7400 0.7360 0.7220 0.7650 20240713-13:42:28 training model 3 20240713-13:42:28 training model 2 20240713-13:51:32 train_perplexity 27 model 3 1.1419000995953705 20240713-13:51:41 train_perplexity 27 model 2 1.142211876962027 20240713-13:51:59 test_perplexity 27 model 3 1.139746019443434 20240713-13:52:10 test_perplexity 27 model 2 1.1400269240456868 20240713-13:55:04 test_accuracy 27 model 3 forward 433 / 506 backward 337 / 494 20240713-13:55:04 main_test_accuracy 27 0.7700000405311584 20240713-13:55:07 test_accuracy 27 model 2 forward 392 / 508 backward 318 / 492 20240713-13:55:07 main_test_accuracy 27 0.7100000381469727 20240713-13:55:07 wrote gpt_003.pth 20240713-13:55:08 wrote gpt_002.pth 20240713-13:55:08 cache_w_quizzes contains 200000 quizzes 20240713-13:55:17 --- epoch 28 ---------------------------------------- 20240713-13:55:17 current_test_accuracies 0.7710 0.7400 0.7100 0.7700 0.7650 20240713-13:55:17 training model 2 20240713-13:55:17 training model 1 20240713-14:04:23 train_perplexity 28 model 2 1.1416374686630293 20240713-14:04:31 train_perplexity 28 model 1 1.1426496270776199 20240713-14:04:51 test_perplexity 28 model 2 1.1384027901947185 20240713-14:05:00 test_perplexity 28 model 1 1.1386598636921552 20240713-14:07:54 test_accuracy 28 model 2 forward 399 / 508 backward 335 / 492 20240713-14:07:54 main_test_accuracy 28 0.734000027179718 20240713-14:07:57 test_accuracy 28 model 1 forward 400 / 495 backward 320 / 505 20240713-14:07:57 main_test_accuracy 28 0.7200000286102295 20240713-14:07:58 wrote gpt_002.pth 20240713-14:07:58 wrote gpt_001.pth 20240713-14:07:58 cache_w_quizzes contains 200000 quizzes 20240713-14:08:07 --- epoch 29 ---------------------------------------- 20240713-14:08:07 current_test_accuracies 0.7710 0.7200 0.7340 0.7700 0.7650 20240713-14:08:07 training model 1 20240713-14:08:07 training model 2 20240713-14:17:12 train_perplexity 29 model 1 1.1410328530013734 20240713-14:17:21 train_perplexity 29 model 2 1.1404303866659504 20240713-14:17:39 test_perplexity 29 model 1 1.1383861010301337 20240713-14:17:49 test_perplexity 29 model 2 1.1384275043022716 20240713-14:20:46 test_accuracy 29 model 1 forward 404 / 495 backward 348 / 505 20240713-14:20:46 main_test_accuracy 29 0.7520000338554382 20240713-14:20:48 test_accuracy 29 model 2 forward 407 / 508 backward 338 / 492 20240713-14:20:48 main_test_accuracy 29 0.7450000643730164 20240713-14:20:49 wrote gpt_001.pth 20240713-14:20:49 wrote gpt_002.pth 20240713-14:20:49 cache_w_quizzes contains 200000 quizzes 20240713-14:20:58 --- epoch 30 ---------------------------------------- 20240713-14:20:58 current_test_accuracies 0.7710 0.7520 0.7450 0.7700 0.7650 20240713-14:20:58 training model 2 20240713-14:20:58 training model 1 20240713-14:30:03 train_perplexity 30 model 2 1.1401659362935403 20240713-14:30:12 train_perplexity 30 model 1 1.1404621807405875 20240713-14:30:30 test_perplexity 30 model 2 1.137022860129421 20240713-14:30:40 test_perplexity 30 model 1 1.1378163096392804 20240713-14:33:35 test_accuracy 30 model 2 forward 426 / 508 backward 350 / 492 20240713-14:33:35 main_test_accuracy 30 0.7760000228881836 20240713-14:33:38 test_accuracy 30 model 1 forward 413 / 495 backward 366 / 505 20240713-14:33:38 main_test_accuracy 30 0.7790000438690186 20240713-14:33:39 wrote gpt_002.pth 20240713-14:33:39 wrote gpt_001.pth 20240713-14:33:39 cache_w_quizzes contains 200000 quizzes 20240713-14:33:48 --- epoch 31 ---------------------------------------- 20240713-14:33:48 current_test_accuracies 0.7710 0.7790 0.7760 0.7700 0.7650 20240713-14:33:48 training model 4 20240713-14:33:48 training model 3 20240713-14:42:54 train_perplexity 31 model 4 1.1420490429182772 20240713-14:43:02 train_perplexity 31 model 3 1.141312365518217 20240713-14:43:22 test_perplexity 31 model 4 1.1382707915886585 20240713-14:43:31 test_perplexity 31 model 3 1.1400651062744662 20240713-14:46:26 test_accuracy 31 model 4 forward 410 / 478 backward 350 / 522 20240713-14:46:26 main_test_accuracy 31 0.7600000500679016 20240713-14:46:27 test_accuracy 31 model 3 forward 421 / 506 backward 344 / 494 20240713-14:46:27 main_test_accuracy 31 0.76500004529953 20240713-14:46:28 wrote gpt_004.pth 20240713-14:46:28 wrote gpt_003.pth 20240713-14:46:28 cache_w_quizzes contains 200000 quizzes 20240713-14:46:37 --- epoch 32 ---------------------------------------- 20240713-14:46:37 current_test_accuracies 0.7710 0.7790 0.7760 0.7650 0.7600 20240713-14:46:37 training model 4 20240713-14:46:37 training model 3 20240713-14:55:43 train_perplexity 32 model 4 1.141017141778864 20240713-14:55:52 train_perplexity 32 model 3 1.140297039737234 20240713-14:56:10 test_perplexity 32 model 4 1.1382333039433123 20240713-14:56:20 test_perplexity 32 model 3 1.1388979713591008 20240713-14:59:16 test_accuracy 32 model 4 forward 399 / 478 backward 329 / 522 20240713-14:59:16 main_test_accuracy 32 0.7280000448226929 20240713-14:59:17 test_accuracy 32 model 3 forward 423 / 506 backward 350 / 494 20240713-14:59:17 main_test_accuracy 32 0.7730000615119934 20240713-14:59:18 wrote gpt_004.pth 20240713-14:59:18 wrote gpt_003.pth 20240713-14:59:18 cache_w_quizzes contains 200000 quizzes 20240713-14:59:26 --- epoch 33 ---------------------------------------- 20240713-14:59:26 current_test_accuracies 0.7710 0.7790 0.7760 0.7730 0.7280 20240713-14:59:26 training model 4 20240713-14:59:26 training model 0 20240713-15:08:32 train_perplexity 33 model 4 1.1403992768449698 20240713-15:08:40 train_perplexity 33 model 0 1.1412539966366468 20240713-15:08:59 test_perplexity 33 model 4 1.136322977883031 20240713-15:09:09 test_perplexity 33 model 0 1.139046204822804 20240713-15:12:04 test_accuracy 33 model 4 forward 411 / 478 backward 366 / 522 20240713-15:12:04 main_test_accuracy 33 0.7770000100135803 20240713-15:12:06 test_accuracy 33 model 0 forward 425 / 504 backward 340 / 496 20240713-15:12:06 main_test_accuracy 33 0.76500004529953 20240713-15:12:06 wrote gpt_004.pth 20240713-15:12:07 wrote gpt_000.pth 20240713-15:12:07 cache_w_quizzes contains 200000 quizzes 20240713-15:12:16 --- epoch 34 ---------------------------------------- 20240713-15:12:16 current_test_accuracies 0.7650 0.7790 0.7760 0.7730 0.7770 20240713-15:12:16 training model 0 20240713-15:12:16 training model 3 20240713-15:21:21 train_perplexity 34 model 0 1.1405375142952072 20240713-15:21:29 train_perplexity 34 model 3 1.1400076949971822 20240713-15:21:49 test_perplexity 34 model 0 1.1377686152201552 20240713-15:21:58 test_perplexity 34 model 3 1.1379609791105103 20240713-15:24:52 test_accuracy 34 model 0 forward 437 / 504 backward 346 / 496 20240713-15:24:52 main_test_accuracy 34 0.7830000519752502 20240713-15:24:54 test_accuracy 34 model 3 forward 445 / 506 backward 359 / 494 20240713-15:24:54 main_test_accuracy 34 0.8040000200271606 20240713-15:24:55 wrote gpt_000.pth 20240713-15:24:55 wrote gpt_003.pth 20240713-15:24:55 cache_w_quizzes contains 200000 quizzes 20240713-15:25:04 --- epoch 35 ---------------------------------------- 20240713-15:25:04 current_test_accuracies 0.7830 0.7790 0.7760 0.8040 0.7770 20240713-15:25:04 training model 2 20240713-15:25:04 training model 4 20240713-15:34:10 train_perplexity 35 model 2 1.1399031355603106 20240713-15:34:19 train_perplexity 35 model 4 1.1394473678972477 20240713-15:34:37 test_perplexity 35 model 2 1.1367143619652087 20240713-15:34:47 test_perplexity 35 model 4 1.1372978789531285 20240713-15:37:41 test_accuracy 35 model 2 forward 416 / 508 backward 343 / 492 20240713-15:37:41 main_test_accuracy 35 0.7590000629425049 20240713-15:37:44 test_accuracy 35 model 4 forward 425 / 478 backward 381 / 522 20240713-15:37:44 main_test_accuracy 35 0.8060000538825989 20240713-15:37:45 wrote gpt_002.pth 20240713-15:37:46 wrote gpt_004.pth 20240713-15:37:46 cache_w_quizzes contains 200000 quizzes 20240713-15:37:54 --- epoch 36 ---------------------------------------- 20240713-15:37:54 current_test_accuracies 0.7830 0.7790 0.7590 0.8040 0.8060 20240713-15:37:54 training model 2 20240713-15:37:54 training model 1 20240713-15:47:00 train_perplexity 36 model 2 1.1387748945068412 20240713-15:47:07 train_perplexity 36 model 1 1.1402834662304027 20240713-15:47:29 test_perplexity 36 model 2 1.1362495182352947 20240713-15:47:37 test_perplexity 36 model 1 1.1375598024348954 20240713-15:50:31 test_accuracy 36 model 2 forward 431 / 508 backward 368 / 492 20240713-15:50:31 main_test_accuracy 36 0.7990000247955322 20240713-15:50:33 test_accuracy 36 model 1 forward 408 / 495 backward 375 / 505 20240713-15:50:33 main_test_accuracy 36 0.7830000519752502 20240713-15:50:34 wrote gpt_002.pth 20240713-15:50:35 wrote gpt_001.pth 20240713-15:50:35 cache_w_quizzes contains 200000 quizzes 20240713-15:50:43 --- epoch 37 ---------------------------------------- 20240713-15:50:43 current_test_accuracies 0.7830 0.7830 0.7990 0.8040 0.8060 20240713-15:50:43 training model 0 20240713-15:50:43 training model 1 20240713-15:59:49 train_perplexity 37 model 0 1.1400670671521018 20240713-15:59:57 train_perplexity 37 model 1 1.1390900072732568 20240713-16:00:17 test_perplexity 37 model 0 1.1372358969432859 20240713-16:00:26 test_perplexity 37 model 1 1.1373159578804428 20240713-16:03:20 test_accuracy 37 model 0 forward 433 / 504 backward 347 / 496 20240713-16:03:20 main_test_accuracy 37 0.7800000309944153 20240713-16:03:23 test_accuracy 37 model 1 forward 409 / 495 backward 376 / 505 20240713-16:03:23 main_test_accuracy 37 0.7850000262260437 20240713-16:03:24 wrote gpt_000.pth 20240713-16:03:24 wrote gpt_001.pth 20240713-16:03:24 cache_w_quizzes contains 200000 quizzes 20240713-16:03:33 --- epoch 38 ---------------------------------------- 20240713-16:03:33 current_test_accuracies 0.7800 0.7850 0.7990 0.8040 0.8060 20240713-16:03:33 training model 0 20240713-16:03:33 training model 1 20240713-16:12:38 train_perplexity 38 model 0 1.1395516572700317 20240713-16:12:47 train_perplexity 38 model 1 1.1391988795235626 20240713-16:13:06 test_perplexity 38 model 0 1.1365340922339842 20240713-16:13:15 test_perplexity 38 model 1 1.1360287550126522 20240713-16:16:11 test_accuracy 38 model 0 forward 449 / 504 backward 365 / 496 20240713-16:16:11 main_test_accuracy 38 0.8140000104904175 20240713-16:16:13 test_accuracy 38 model 1 forward 412 / 495 backward 374 / 505 20240713-16:16:13 main_test_accuracy 38 0.7860000133514404 20240713-16:16:14 wrote gpt_000.pth 20240713-16:16:15 wrote gpt_001.pth 20240713-16:16:15 cache_w_quizzes contains 200000 quizzes 20240713-16:16:23 --- epoch 39 ---------------------------------------- 20240713-16:16:23 current_test_accuracies 0.8140 0.7860 0.7990 0.8040 0.8060 20240713-16:16:23 training model 1 20240713-16:16:23 training model 2 20240713-16:25:28 train_perplexity 39 model 1 1.1380765827565877 20240713-16:25:37 train_perplexity 39 model 2 1.1380351079952464 20240713-16:25:55 test_perplexity 39 model 1 1.1354242598397308 20240713-16:26:06 test_perplexity 39 model 2 1.1363258313992146 20240713-16:29:01 test_accuracy 39 model 1 forward 421 / 495 backward 395 / 505 20240713-16:29:01 main_test_accuracy 39 0.8160000443458557 20240713-16:29:03 test_accuracy 39 model 2 forward 417 / 508 backward 345 / 492 20240713-16:29:03 main_test_accuracy 39 0.7620000243186951 20240713-16:29:03 wrote gpt_001.pth 20240713-16:29:04 wrote gpt_002.pth 20240713-16:29:04 cache_w_quizzes contains 200000 quizzes 20240713-16:29:12 --- epoch 40 ---------------------------------------- 20240713-16:29:12 current_test_accuracies 0.8140 0.8160 0.7620 0.8040 0.8060 20240713-16:29:12 training model 2 20240713-16:29:12 training model 3 20240713-16:38:18 train_perplexity 40 model 2 1.1380159367651164 20240713-16:38:26 train_perplexity 40 model 3 1.1393001272519052 20240713-16:38:46 test_perplexity 40 model 2 1.1357281884602197 20240713-16:38:55 test_perplexity 40 model 3 1.1380794706254698 20240713-16:41:50 test_accuracy 40 model 2 forward 427 / 508 backward 361 / 492 20240713-16:41:50 main_test_accuracy 40 0.7880000472068787 20240713-16:41:52 test_accuracy 40 model 3 forward 449 / 506 backward 361 / 494 20240713-16:41:52 main_test_accuracy 40 0.8100000619888306 20240713-16:41:53 wrote gpt_002.pth 20240713-16:41:53 wrote gpt_003.pth 20240713-16:41:53 cache_w_quizzes contains 200000 quizzes 20240713-16:42:02 --- epoch 41 ---------------------------------------- 20240713-16:42:02 current_test_accuracies 0.8140 0.8160 0.7880 0.8100 0.8060 20240713-16:42:02 training model 2 20240713-16:42:02 training model 4 20240713-16:51:07 train_perplexity 41 model 2 1.1377035421928328 20240713-16:51:16 train_perplexity 41 model 4 1.1391631068206547 20240713-16:51:33 test_perplexity 41 model 2 1.1350658179161734 20240713-16:51:44 test_perplexity 41 model 4 1.1359266030382924 20240713-16:54:37 test_accuracy 41 model 2 forward 428 / 508 backward 348 / 492 20240713-16:54:37 main_test_accuracy 41 0.7760000228881836 20240713-16:54:40 test_accuracy 41 model 4 forward 434 / 478 backward 370 / 522 20240713-16:54:40 main_test_accuracy 41 0.8040000200271606 20240713-16:54:41 wrote gpt_002.pth 20240713-16:54:42 wrote gpt_004.pth 20240713-16:54:42 cache_w_quizzes contains 200000 quizzes 20240713-16:54:50 --- epoch 42 ---------------------------------------- 20240713-16:54:50 current_test_accuracies 0.8140 0.8160 0.7760 0.8100 0.8040 20240713-16:54:50 training model 2 20240713-16:54:50 training model 4 20240713-17:03:56 train_perplexity 42 model 2 1.1374743680951986 20240713-17:04:05 train_perplexity 42 model 4 1.138546482952817 20240713-17:04:23 test_perplexity 42 model 2 1.1351506615893003 20240713-17:04:33 test_perplexity 42 model 4 1.1354565557648006 20240713-17:07:27 test_accuracy 42 model 2 forward 445 / 508 backward 379 / 492 20240713-17:07:27 main_test_accuracy 42 0.8240000605583191 20240713-17:07:30 test_accuracy 42 model 4 forward 429 / 478 backward 389 / 522 20240713-17:07:30 main_test_accuracy 42 0.8180000185966492 20240713-17:07:31 wrote gpt_002.pth 20240713-17:07:31 wrote gpt_004.pth 20240713-17:07:31 cache_w_quizzes contains 200000 quizzes 20240713-17:07:40 --- epoch 43 ---------------------------------------- 20240713-17:07:40 current_test_accuracies 0.8140 0.8160 0.8240 0.8100 0.8180 20240713-17:07:40 training model 3 20240713-17:07:40 training model 0 20240713-17:16:45 train_perplexity 43 model 3 1.1387466057648072 20240713-17:16:54 train_perplexity 43 model 0 1.139081094318976 20240713-17:17:12 test_perplexity 43 model 3 1.1368151453577555 20240713-17:17:23 test_perplexity 43 model 0 1.1362333604679904 20240713-17:20:17 test_accuracy 43 model 3 forward 452 / 506 backward 363 / 494 20240713-17:20:17 main_test_accuracy 43 0.815000057220459 20240713-17:20:19 test_accuracy 43 model 0 forward 444 / 504 backward 360 / 496 20240713-17:20:19 main_test_accuracy 43 0.8040000200271606 20240713-17:20:20 wrote gpt_003.pth 20240713-17:20:21 wrote gpt_000.pth 20240713-17:20:21 cache_w_quizzes contains 200000 quizzes 20240713-17:20:30 --- epoch 44 ---------------------------------------- 20240713-17:20:30 current_test_accuracies 0.8040 0.8160 0.8240 0.8150 0.8180 20240713-17:20:30 training model 0 20240713-17:20:30 training model 3 20240713-17:29:35 train_perplexity 44 model 0 1.1384473656155 20240713-17:29:44 train_perplexity 44 model 3 1.1382066226345624 20240713-17:30:02 test_perplexity 44 model 0 1.1360898733001596 20240713-17:30:12 test_perplexity 44 model 3 1.138336030127633 20240713-17:33:06 test_accuracy 44 model 0 forward 452 / 504 backward 365 / 496 20240713-17:33:06 main_test_accuracy 44 0.8170000314712524 20240713-17:33:09 test_accuracy 44 model 3 forward 439 / 506 backward 351 / 494 20240713-17:33:09 main_test_accuracy 44 0.7900000214576721 20240713-17:33:10 wrote gpt_000.pth 20240713-17:33:10 wrote gpt_003.pth 20240713-17:33:10 cache_w_quizzes contains 200000 quizzes 20240713-17:33:18 --- epoch 45 ---------------------------------------- 20240713-17:33:18 current_test_accuracies 0.8170 0.8160 0.8240 0.7900 0.8180 20240713-17:33:18 training model 3 20240713-17:33:18 training model 1 20240713-17:42:23 train_perplexity 45 model 3 1.1380907576348935 20240713-17:42:32 train_perplexity 45 model 1 1.1378670086721365 20240713-17:42:51 test_perplexity 45 model 3 1.1366110618007472 20240713-17:43:00 test_perplexity 45 model 1 1.1353832497483916 20240713-17:45:54 test_accuracy 45 model 3 forward 462 / 506 backward 369 / 494 20240713-17:45:54 main_test_accuracy 45 0.831000030040741 20240713-17:45:57 test_accuracy 45 model 1 forward 432 / 495 backward 397 / 505 20240713-17:45:57 main_test_accuracy 45 0.8290000557899475 20240713-17:45:58 wrote gpt_003.pth 20240713-17:45:58 wrote gpt_001.pth 20240713-17:45:58 cache_w_quizzes contains 200000 quizzes 20240713-17:46:07 --- epoch 46 ---------------------------------------- 20240713-17:46:07 current_test_accuracies 0.8170 0.8290 0.8240 0.8310 0.8180 20240713-17:46:07 training model 0 20240713-17:46:07 training model 4 20240713-17:55:13 train_perplexity 46 model 0 1.1381628454362787 20240713-17:55:21 train_perplexity 46 model 4 1.1383035603416036 20240713-17:55:42 test_perplexity 46 model 0 1.1357564183156585 20240713-17:55:51 test_perplexity 46 model 4 1.1350894047284006 20240713-17:58:44 test_accuracy 46 model 0 forward 451 / 504 backward 367 / 496 20240713-17:58:44 main_test_accuracy 46 0.8180000185966492 20240713-17:58:48 test_accuracy 46 model 4 forward 426 / 478 backward 386 / 522 20240713-17:58:48 main_test_accuracy 46 0.812000036239624 20240713-17:58:49 wrote gpt_000.pth 20240713-17:58:49 wrote gpt_004.pth 20240713-17:58:49 cache_w_quizzes contains 200000 quizzes 20240713-17:58:57 --- epoch 47 ---------------------------------------- 20240713-17:58:57 current_test_accuracies 0.8180 0.8290 0.8240 0.8310 0.8120 20240713-17:58:57 training model 4 20240713-17:58:57 training model 0 20240713-18:08:04 train_perplexity 47 model 4 1.1378861237091347 20240713-18:08:11 train_perplexity 47 model 0 1.1373080386820098 20240713-18:08:32 test_perplexity 47 model 4 1.1346244754344093 20240713-18:08:41 test_perplexity 47 model 0 1.1350774252178881 20240713-18:11:39 test_accuracy 47 model 4 forward 443 / 478 backward 369 / 522 20240713-18:11:39 main_test_accuracy 47 0.812000036239624 20240713-18:11:40 test_accuracy 47 model 0 forward 456 / 504 backward 363 / 496 20240713-18:11:40 main_test_accuracy 47 0.8190000653266907 20240713-18:11:41 wrote gpt_004.pth 20240713-18:11:42 wrote gpt_000.pth 20240713-18:11:42 cache_w_quizzes contains 200000 quizzes 20240713-18:11:49 --- epoch 48 ---------------------------------------- 20240713-18:11:49 current_test_accuracies 0.8190 0.8290 0.8240 0.8310 0.8120 20240713-18:11:49 training model 4 20240713-18:11:49 training model 0 20240713-18:20:55 train_perplexity 48 model 4 1.1375193610443866 20240713-18:21:03 train_perplexity 48 model 0 1.1373145133037965 20240713-18:21:23 test_perplexity 48 model 4 1.1342834474109427 20240713-18:21:32 test_perplexity 48 model 0 1.1348012222317458 20240713-18:24:29 test_accuracy 48 model 4 forward 443 / 478 backward 372 / 522 20240713-18:24:29 main_test_accuracy 48 0.815000057220459 20240713-18:24:31 test_accuracy 48 model 0 forward 468 / 504 backward 376 / 496 20240713-18:24:31 main_test_accuracy 48 0.8440000414848328 20240713-18:24:32 wrote gpt_004.pth 20240713-18:24:32 wrote gpt_000.pth 20240713-18:24:32 cache_w_quizzes contains 200000 quizzes 20240713-18:24:40 --- epoch 49 ---------------------------------------- 20240713-18:24:40 current_test_accuracies 0.8440 0.8290 0.8240 0.8310 0.8150 20240713-18:24:40 training model 4 20240713-18:24:40 training model 2 20240713-18:33:46 train_perplexity 49 model 4 1.137014059199615 20240713-18:33:54 train_perplexity 49 model 2 1.1372211408781028 20240713-18:34:14 test_perplexity 49 model 4 1.1341257916524288 20240713-18:34:23 test_perplexity 49 model 2 1.134325362557035 20240713-18:37:19 test_accuracy 49 model 4 forward 435 / 478 backward 399 / 522 20240713-18:37:19 main_test_accuracy 49 0.8340000510215759 20240713-18:37:20 test_accuracy 49 model 2 forward 449 / 508 backward 380 / 492 20240713-18:37:20 main_test_accuracy 49 0.8290000557899475 20240713-18:37:21 wrote gpt_004.pth 20240713-18:37:21 wrote gpt_002.pth 20240713-18:37:21 cache_w_quizzes contains 200000 quizzes 20240713-18:37:30 --- epoch 50 ---------------------------------------- 20240713-18:37:30 current_test_accuracies 0.8440 0.8290 0.8290 0.8310 0.8340 20240713-18:37:30 training model 1 20240713-18:37:30 training model 2 20240713-18:46:35 train_perplexity 50 model 1 1.1375536447264547 20240713-18:46:44 train_perplexity 50 model 2 1.136869369420536 20240713-18:47:02 test_perplexity 50 model 1 1.1348885170133471 20240713-18:47:12 test_perplexity 50 model 2 1.1344144789139505 20240713-18:50:09 test_accuracy 50 model 1 forward 427 / 495 backward 383 / 505 20240713-18:50:09 main_test_accuracy 50 0.8100000619888306 20240713-18:50:11 test_accuracy 50 model 2 forward 452 / 508 backward 370 / 492 20240713-18:50:11 main_test_accuracy 50 0.8220000267028809 20240713-18:50:12 wrote gpt_001.pth 20240713-18:50:12 wrote gpt_002.pth 20240713-18:50:12 cache_w_quizzes contains 200000 quizzes 20240713-18:50:21 --- epoch 51 ---------------------------------------- 20240713-18:50:21 current_test_accuracies 0.8440 0.8100 0.8220 0.8310 0.8340 20240713-18:50:21 training model 1 20240713-18:50:21 training model 2 20240713-18:59:26 train_perplexity 51 model 1 1.1372181995708743 20240713-18:59:35 train_perplexity 51 model 2 1.1367128570645568 20240713-18:59:54 test_perplexity 51 model 1 1.134554616203293 20240713-19:00:04 test_perplexity 51 model 2 1.13405585220628 20240713-19:03:00 test_accuracy 51 model 1 forward 420 / 495 backward 391 / 505 20240713-19:03:00 main_test_accuracy 51 0.8110000491142273 20240713-19:03:01 test_accuracy 51 model 2 forward 460 / 508 backward 367 / 492 20240713-19:03:01 main_test_accuracy 51 0.8270000219345093 20240713-19:03:02 wrote gpt_001.pth 20240713-19:03:02 wrote gpt_002.pth 20240713-19:03:02 cache_w_quizzes contains 200000 quizzes 20240713-19:03:11 --- epoch 52 ---------------------------------------- 20240713-19:03:11 current_test_accuracies 0.8440 0.8110 0.8270 0.8310 0.8340 20240713-19:03:11 training model 1 20240713-19:03:11 training model 2 20240713-19:12:16 train_perplexity 52 model 1 1.1368258001279703 20240713-19:12:25 train_perplexity 52 model 2 1.1362768514339643 20240713-19:12:44 test_perplexity 52 model 1 1.1347294296299255 20240713-19:12:54 test_perplexity 52 model 2 1.1338620504877324 20240713-19:15:48 test_accuracy 52 model 1 forward 432 / 495 backward 397 / 505 20240713-19:15:48 main_test_accuracy 52 0.8290000557899475 20240713-19:15:49 test_accuracy 52 model 2 forward 455 / 508 backward 382 / 492 20240713-19:15:49 main_test_accuracy 52 0.8370000123977661 20240713-19:15:50 wrote gpt_001.pth 20240713-19:15:51 wrote gpt_002.pth 20240713-19:15:51 cache_w_quizzes contains 200000 quizzes 20240713-19:15:59 --- epoch 53 ---------------------------------------- 20240713-19:15:59 current_test_accuracies 0.8440 0.8290 0.8370 0.8310 0.8340 20240713-19:15:59 training model 1 20240713-19:15:59 training model 3 20240713-19:25:04 train_perplexity 53 model 1 1.1367839196830034 20240713-19:25:14 train_perplexity 53 model 3 1.137552003650326 20240713-19:25:31 test_perplexity 53 model 1 1.1341662060896347 20240713-19:25:42 test_perplexity 53 model 3 1.136246787875008 20240713-19:28:37 test_accuracy 53 model 1 forward 437 / 495 backward 388 / 505 20240713-19:28:37 main_test_accuracy 53 0.8250000476837158 20240713-19:28:38 test_accuracy 53 model 3 forward 462 / 506 backward 369 / 494 20240713-19:28:38 main_test_accuracy 53 0.831000030040741 20240713-19:28:39 wrote gpt_001.pth 20240713-19:28:40 wrote gpt_003.pth 20240713-19:28:40 cache_w_quizzes contains 200000 quizzes 20240713-19:28:48 --- epoch 54 ---------------------------------------- 20240713-19:28:48 current_test_accuracies 0.8440 0.8250 0.8370 0.8310 0.8340 20240713-19:28:48 training model 1 20240713-19:28:48 training model 3 20240713-19:37:54 train_perplexity 54 model 1 1.1364982245609034 20240713-19:38:02 train_perplexity 54 model 3 1.1373357925858818 20240713-19:38:21 test_perplexity 54 model 1 1.1342480517752667 20240713-19:38:31 test_perplexity 54 model 3 1.1355050635147934 20240713-19:41:26 test_accuracy 54 model 1 forward 430 / 495 backward 385 / 505 20240713-19:41:26 main_test_accuracy 54 0.815000057220459 20240713-19:41:27 test_accuracy 54 model 3 forward 473 / 506 backward 379 / 494 20240713-19:41:27 main_test_accuracy 54 0.8520000576972961 20240713-19:41:28 wrote gpt_001.pth 20240713-19:41:28 wrote gpt_003.pth 20240713-19:41:28 cache_w_quizzes contains 200000 quizzes 20240713-19:41:37 --- epoch 55 ---------------------------------------- 20240713-19:41:37 current_test_accuracies 0.8440 0.8150 0.8370 0.8520 0.8340 20240713-19:41:37 training model 1 20240713-19:41:37 training model 4 20240713-19:50:42 train_perplexity 55 model 1 1.136380803199218 20240713-19:50:50 train_perplexity 55 model 4 1.1369618197318725 20240713-19:51:10 test_perplexity 55 model 1 1.1333587386611235 20240713-19:51:19 test_perplexity 55 model 4 1.1347892085402278 20240713-19:54:14 test_accuracy 55 model 1 forward 451 / 495 backward 400 / 505 20240713-19:54:14 main_test_accuracy 55 0.8510000109672546 20240713-19:54:17 test_accuracy 55 model 4 forward 436 / 478 backward 387 / 522 20240713-19:54:17 main_test_accuracy 55 0.8230000138282776 20240713-19:54:17 wrote gpt_001.pth 20240713-19:54:18 wrote gpt_004.pth 20240713-19:54:18 cache_w_quizzes contains 200000 quizzes 20240713-19:54:27 --- epoch 56 ---------------------------------------- 20240713-19:54:27 current_test_accuracies 0.8440 0.8510 0.8370 0.8520 0.8230 20240713-19:54:27 training model 4 20240713-19:54:27 training model 2 20240713-20:03:33 train_perplexity 56 model 4 1.1365555449970068 20240713-20:03:40 train_perplexity 56 model 2 1.1362761339049496 20240713-20:04:01 test_perplexity 56 model 4 1.134472576219401 20240713-20:04:10 test_perplexity 56 model 2 1.1335000223238898 20240713-20:07:05 test_accuracy 56 model 4 forward 434 / 478 backward 385 / 522 20240713-20:07:05 main_test_accuracy 56 0.8190000653266907 20240713-20:07:06 test_accuracy 56 model 2 forward 455 / 508 backward 381 / 492 20240713-20:07:06 main_test_accuracy 56 0.8360000252723694 20240713-20:07:07 wrote gpt_004.pth 20240713-20:07:07 wrote gpt_002.pth 20240713-20:07:07 cache_w_quizzes contains 200000 quizzes 20240713-20:07:16 --- epoch 57 ---------------------------------------- 20240713-20:07:16 current_test_accuracies 0.8440 0.8510 0.8360 0.8520 0.8190 20240713-20:07:16 training model 4 20240713-20:07:16 training model 2 20240713-20:16:22 train_perplexity 57 model 4 1.1363899663356662 20240713-20:16:30 train_perplexity 57 model 2 1.1360243152633607 20240713-20:16:49 test_perplexity 57 model 4 1.1332782771313006 20240713-20:16:59 test_perplexity 57 model 2 1.1334463912731583 20240713-20:19:54 test_accuracy 57 model 4 forward 453 / 478 backward 388 / 522 20240713-20:19:54 main_test_accuracy 57 0.8410000205039978 20240713-20:19:55 test_accuracy 57 model 2 forward 459 / 508 backward 399 / 492 20240713-20:19:55 main_test_accuracy 57 0.8580000400543213 20240713-20:19:56 wrote gpt_004.pth 20240713-20:19:57 wrote gpt_002.pth 20240713-20:19:57 cache_w_quizzes contains 200000 quizzes 20240713-20:20:06 --- epoch 58 ---------------------------------------- 20240713-20:20:06 current_test_accuracies 0.8440 0.8510 0.8580 0.8520 0.8410 20240713-20:20:06 training model 4 20240713-20:20:06 training model 0 20240713-20:29:12 train_perplexity 58 model 4 1.1366579151594793 20240713-20:29:19 train_perplexity 58 model 0 1.1367595695620336 20240713-20:29:40 test_perplexity 58 model 4 1.1338661186804868 20240713-20:29:49 test_perplexity 58 model 0 1.134571835786723 20240713-20:32:45 test_accuracy 58 model 4 forward 434 / 478 backward 408 / 522 20240713-20:32:45 main_test_accuracy 58 0.8420000672340393 20240713-20:32:47 test_accuracy 58 model 0 forward 473 / 504 backward 386 / 496 20240713-20:32:47 main_test_accuracy 58 0.859000027179718 20240713-20:32:48 wrote gpt_004.pth 20240713-20:32:48 wrote gpt_000.pth 20240713-20:32:48 cache_w_quizzes contains 200000 quizzes 20240713-20:32:57 --- epoch 59 ---------------------------------------- 20240713-20:32:57 current_test_accuracies 0.8590 0.8510 0.8580 0.8520 0.8420 20240713-20:32:57 training model 4 20240713-20:32:57 training model 1 20240713-20:42:03 train_perplexity 59 model 4 1.1362052829800762 20240713-20:42:10 train_perplexity 59 model 1 1.1359418267758234 20240713-20:42:31 test_perplexity 59 model 4 1.1333642615909196 20240713-20:42:40 test_perplexity 59 model 1 1.133504883621717 20240713-20:45:35 test_accuracy 59 model 4 forward 458 / 478 backward 419 / 522 20240713-20:45:35 main_test_accuracy 59 0.8770000338554382 20240713-20:45:37 test_accuracy 59 model 1 forward 450 / 495 backward 404 / 505 20240713-20:45:37 main_test_accuracy 59 0.8540000319480896 20240713-20:45:38 wrote gpt_004.pth 20240713-20:45:38 wrote gpt_001.pth 20240713-20:45:38 cache_w_quizzes contains 200000 quizzes 20240713-20:45:47 --- epoch 60 ---------------------------------------- 20240713-20:45:47 current_test_accuracies 0.8590 0.8540 0.8580 0.8520 0.8770 20240713-20:45:47 training model 3 20240713-20:45:47 training model 1 20240713-20:54:52 train_perplexity 60 model 3 1.1375223166654376 20240713-20:55:01 train_perplexity 60 model 1 1.1357919838903072 20240713-20:55:20 test_perplexity 60 model 3 1.135617402241086 20240713-20:55:30 test_perplexity 60 model 1 1.133436570881939 20240713-20:58:24 test_accuracy 60 model 3 forward 476 / 506 backward 384 / 494 20240713-20:58:24 main_test_accuracy 60 0.8600000143051147 20240713-20:58:26 test_accuracy 60 model 1 forward 445 / 495 backward 409 / 505 20240713-20:58:26 main_test_accuracy 60 0.8540000319480896 20240713-20:58:27 wrote gpt_003.pth 20240713-20:58:28 wrote gpt_001.pth 20240713-20:58:28 cache_w_quizzes contains 200000 quizzes 20240713-20:58:37 --- epoch 61 ---------------------------------------- 20240713-20:58:37 current_test_accuracies 0.8590 0.8540 0.8580 0.8600 0.8770 20240713-20:58:37 training model 1 20240713-20:58:37 training model 2 20240713-21:07:43 train_perplexity 61 model 1 1.1356155782266955 20240713-21:07:51 train_perplexity 61 model 2 1.1351154998769248 20240713-21:08:11 test_perplexity 61 model 1 1.1331289822658963 20240713-21:08:20 test_perplexity 61 model 2 1.1339023116954188 20240713-21:11:14 test_accuracy 61 model 1 forward 453 / 495 backward 404 / 505 20240713-21:11:14 main_test_accuracy 61 0.8570000529289246 20240713-21:11:15 test_accuracy 61 model 2 forward 469 / 508 backward 399 / 492 20240713-21:11:15 main_test_accuracy 61 0.8680000305175781 20240713-21:11:16 wrote gpt_001.pth 20240713-21:11:16 wrote gpt_002.pth 20240713-21:11:16 cache_w_quizzes contains 200000 quizzes 20240713-21:11:25 --- epoch 62 ---------------------------------------- 20240713-21:11:25 current_test_accuracies 0.8590 0.8570 0.8680 0.8600 0.8770 20240713-21:11:25 training model 1 20240713-21:11:25 training model 0 20240713-21:20:30 train_perplexity 62 model 1 1.135865535351339 20240713-21:20:38 train_perplexity 62 model 0 1.1364608727835719 20240713-21:20:59 test_perplexity 62 model 1 1.1330484829349885 20240713-21:21:07 test_perplexity 62 model 0 1.1349711451003215 20240713-21:24:03 test_accuracy 62 model 1 forward 447 / 495 backward 419 / 505 20240713-21:24:03 main_test_accuracy 62 0.8660000562667847 20240713-21:24:05 test_accuracy 62 model 0 forward 459 / 504 backward 389 / 496 20240713-21:24:05 main_test_accuracy 62 0.8480000495910645 20240713-21:24:06 wrote gpt_001.pth 20240713-21:24:06 wrote gpt_000.pth 20240713-21:24:06 cache_w_quizzes contains 200000 quizzes 20240713-21:24:15 --- epoch 63 ---------------------------------------- 20240713-21:24:15 current_test_accuracies 0.8480 0.8660 0.8680 0.8600 0.8770 20240713-21:24:15 training model 0 20240713-21:24:15 training model 3 20240713-21:33:20 train_perplexity 63 model 0 1.136503326030227 20240713-21:33:29 train_perplexity 63 model 3 1.1363387126551479 20240713-21:33:48 test_perplexity 63 model 0 1.1345252142203408 20240713-21:33:58 test_perplexity 63 model 3 1.1353573854613908 20240713-21:36:54 test_accuracy 63 model 0 forward 463 / 504 backward 375 / 496 20240713-21:36:54 main_test_accuracy 63 0.8380000591278076 20240713-21:36:55 test_accuracy 63 model 3 forward 474 / 506 backward 394 / 494 20240713-21:36:55 main_test_accuracy 63 0.8680000305175781 20240713-21:36:56 wrote gpt_000.pth 20240713-21:36:57 wrote gpt_003.pth 20240713-21:36:57 cache_w_quizzes contains 200000 quizzes 20240713-21:37:06 --- epoch 64 ---------------------------------------- 20240713-21:37:06 current_test_accuracies 0.8380 0.8660 0.8680 0.8680 0.8770 20240713-21:37:06 training model 0 20240713-21:37:06 training model 1 20240713-21:46:11 train_perplexity 64 model 0 1.1364379466623513 20240713-21:46:19 train_perplexity 64 model 1 1.1350359879297387 20240713-21:46:40 test_perplexity 64 model 0 1.1341774787503418 20240713-21:46:49 test_perplexity 64 model 1 1.1329125680686185 20240713-21:49:43 test_accuracy 64 model 0 forward 471 / 504 backward 396 / 496 20240713-21:49:43 main_test_accuracy 64 0.8670000433921814 20240713-21:49:46 test_accuracy 64 model 1 forward 462 / 495 backward 418 / 505 20240713-21:49:46 main_test_accuracy 64 0.8800000548362732 20240713-21:49:47 wrote gpt_000.pth 20240713-21:49:47 wrote gpt_001.pth 20240713-21:49:47 cache_w_quizzes contains 200000 quizzes 20240713-21:49:54 --- epoch 65 ---------------------------------------- 20240713-21:49:54 current_test_accuracies 0.8670 0.8800 0.8680 0.8680 0.8770 20240713-21:49:54 training model 0 20240713-21:49:54 training model 2 20240713-21:59:00 train_perplexity 65 model 0 1.1360745271383643 20240713-21:59:08 train_perplexity 65 model 2 1.1361125596075425 20240713-21:59:27 test_perplexity 65 model 0 1.134093468325979 20240713-21:59:37 test_perplexity 65 model 2 1.1329831250208326 20240713-22:02:31 test_accuracy 65 model 0 forward 470 / 504 backward 392 / 496 20240713-22:02:31 main_test_accuracy 65 0.862000048160553 20240713-22:02:34 test_accuracy 65 model 2 forward 456 / 508 backward 407 / 492 20240713-22:02:34 main_test_accuracy 65 0.8630000352859497 20240713-22:02:35 wrote gpt_000.pth 20240713-22:02:35 wrote gpt_002.pth 20240713-22:02:35 cache_w_quizzes contains 200000 quizzes 20240713-22:02:44 --- epoch 66 ---------------------------------------- 20240713-22:02:44 current_test_accuracies 0.8620 0.8800 0.8630 0.8680 0.8770 20240713-22:02:44 training model 0 20240713-22:02:44 training model 2 20240713-22:11:49 train_perplexity 66 model 0 1.1358647046946535 20240713-22:11:59 train_perplexity 66 model 2 1.1355138813557606 20240713-22:12:17 test_perplexity 66 model 0 1.1338636700205431 20240713-22:12:27 test_perplexity 66 model 2 1.1335008093565648 20240713-22:15:20 test_accuracy 66 model 0 forward 483 / 504 backward 414 / 496 20240713-22:15:20 main_test_accuracy 66 0.8970000147819519 20240713-22:15:22 test_accuracy 66 model 2 forward 465 / 508 backward 399 / 492 20240713-22:15:22 main_test_accuracy 66 0.8640000224113464 20240713-22:15:23 wrote gpt_000.pth 20240713-22:15:23 wrote gpt_002.pth 20240713-22:15:23 cache_w_quizzes contains 200000 quizzes 20240713-22:15:31 --- epoch 67 ---------------------------------------- 20240713-22:15:31 current_test_accuracies 0.8970 0.8800 0.8640 0.8680 0.8770 20240713-22:15:31 training model 2 20240713-22:15:31 training model 3 20240713-22:24:37 train_perplexity 67 model 2 1.1351960228503857 20240713-22:24:46 train_perplexity 67 model 3 1.1361206929292165 20240713-22:25:04 test_perplexity 67 model 2 1.132895414880024 20240713-22:25:14 test_perplexity 67 model 3 1.1356950279461038 20240713-22:28:07 test_accuracy 67 model 2 forward 465 / 508 backward 387 / 492 20240713-22:28:07 main_test_accuracy 67 0.8520000576972961 20240713-22:28:09 test_accuracy 67 model 3 forward 475 / 506 backward 401 / 494 20240713-22:28:09 main_test_accuracy 67 0.8760000467300415 20240713-22:28:10 wrote gpt_002.pth 20240713-22:28:10 wrote gpt_003.pth 20240713-22:28:10 cache_w_quizzes contains 200000 quizzes 20240713-22:28:19 --- epoch 68 ---------------------------------------- 20240713-22:28:19 current_test_accuracies 0.8970 0.8800 0.8520 0.8760 0.8770 20240713-22:28:19 training model 2 20240713-22:28:19 training model 3 20240713-22:37:25 train_perplexity 68 model 2 1.1351846910534398 20240713-22:37:33 train_perplexity 68 model 3 1.1366907298944238 20240713-22:37:53 test_perplexity 68 model 2 1.1326622204340973 20240713-22:38:02 test_perplexity 68 model 3 1.135506893853329 20240713-22:40:57 test_accuracy 68 model 2 forward 477 / 508 backward 401 / 492 20240713-22:40:57 main_test_accuracy 68 0.878000020980835 20240713-22:40:59 test_accuracy 68 model 3 forward 478 / 506 backward 390 / 494 20240713-22:40:59 main_test_accuracy 68 0.8680000305175781 20240713-22:41:00 wrote gpt_002.pth 20240713-22:41:00 wrote gpt_003.pth 20240713-22:41:00 cache_w_quizzes contains 200000 quizzes 20240713-22:41:09 --- epoch 69 ---------------------------------------- 20240713-22:41:09 current_test_accuracies 0.8970 0.8800 0.8780 0.8680 0.8770 20240713-22:41:09 training model 3 20240713-22:41:09 training model 4 20240713-22:50:15 train_perplexity 69 model 3 1.135807126185921 20240713-22:50:23 train_perplexity 69 model 4 1.1358523029675587 20240713-22:50:42 test_perplexity 69 model 3 1.13524778751111 20240713-22:50:52 test_perplexity 69 model 4 1.1340063094303823 20240713-22:53:46 test_accuracy 69 model 3 forward 466 / 506 backward 379 / 494 20240713-22:53:46 main_test_accuracy 69 0.8450000286102295 20240713-22:53:49 test_accuracy 69 model 4 forward 449 / 478 backward 417 / 522 20240713-22:53:49 main_test_accuracy 69 0.8660000562667847 20240713-22:53:50 wrote gpt_003.pth 20240713-22:53:50 wrote gpt_004.pth 20240713-22:53:50 cache_w_quizzes contains 200000 quizzes 20240713-22:53:59 --- epoch 70 ---------------------------------------- 20240713-22:53:59 current_test_accuracies 0.8970 0.8800 0.8780 0.8450 0.8660 20240713-22:53:59 training model 3 20240713-22:53:59 training model 4 20240713-23:03:05 train_perplexity 70 model 3 1.1358723356988996 20240713-23:03:13 train_perplexity 70 model 4 1.1354194701491815 20240713-23:03:33 test_perplexity 70 model 3 1.134994141678029 20240713-23:03:42 test_perplexity 70 model 4 1.1335788224251295 20240713-23:06:37 test_accuracy 70 model 3 forward 482 / 506 backward 400 / 494 20240713-23:06:37 main_test_accuracy 70 0.8820000290870667 20240713-23:06:40 test_accuracy 70 model 4 forward 456 / 478 backward 424 / 522 20240713-23:06:40 main_test_accuracy 70 0.8800000548362732 20240713-23:06:41 wrote gpt_003.pth 20240713-23:06:41 wrote gpt_004.pth 20240713-23:06:41 cache_w_quizzes contains 200000 quizzes 20240713-23:06:50 --- epoch 71 ---------------------------------------- 20240713-23:06:50 current_test_accuracies 0.8970 0.8800 0.8780 0.8820 0.8800 20240713-23:06:50 training model 2 20240713-23:06:50 training model 1 20240713-23:15:56 train_perplexity 71 model 2 1.1351431248859956 20240713-23:16:04 train_perplexity 71 model 1 1.135146114438316 20240713-23:16:25 test_perplexity 71 model 2 1.132777404131269 20240713-23:16:33 test_perplexity 71 model 1 1.1330620054969789 20240713-23:19:28 test_accuracy 71 model 2 forward 482 / 508 backward 408 / 492 20240713-23:19:28 main_test_accuracy 71 0.89000004529953 20240713-23:19:30 test_accuracy 71 model 1 forward 459 / 495 backward 421 / 505 20240713-23:19:30 main_test_accuracy 71 0.8800000548362732 20240713-23:19:31 wrote gpt_002.pth 20240713-23:19:31 wrote gpt_001.pth 20240713-23:19:31 cache_w_quizzes contains 200000 quizzes 20240713-23:19:40 --- epoch 72 ---------------------------------------- 20240713-23:19:40 current_test_accuracies 0.8970 0.8800 0.8900 0.8820 0.8800 20240713-23:19:40 training model 1 20240713-23:19:40 training model 4 20240713-23:28:46 train_perplexity 72 model 1 1.1348892255128211 20240713-23:28:54 train_perplexity 72 model 4 1.1354711548628338 20240713-23:29:14 test_perplexity 72 model 1 1.1325243054828829 20240713-23:29:23 test_perplexity 72 model 4 1.1328777051670478 20240713-23:32:17 test_accuracy 72 model 1 forward 460 / 495 backward 423 / 505 20240713-23:32:17 main_test_accuracy 72 0.8830000162124634 20240713-23:32:20 test_accuracy 72 model 4 forward 454 / 478 backward 426 / 522 20240713-23:32:20 main_test_accuracy 72 0.8800000548362732 20240713-23:32:20 wrote gpt_001.pth 20240713-23:32:21 wrote gpt_004.pth 20240713-23:32:21 cache_w_quizzes contains 200000 quizzes 20240713-23:32:30 --- epoch 73 ---------------------------------------- 20240713-23:32:30 current_test_accuracies 0.8970 0.8830 0.8900 0.8820 0.8800 20240713-23:32:30 training model 4 20240713-23:32:30 training model 3 20240713-23:41:36 train_perplexity 73 model 4 1.1353292697246242 20240713-23:41:43 train_perplexity 73 model 3 1.1356498157354875 20240713-23:42:04 test_perplexity 73 model 4 1.132793165575179 20240713-23:42:13 test_perplexity 73 model 3 1.1346426833907752 20240713-23:45:09 test_accuracy 73 model 4 forward 448 / 478 backward 413 / 522 20240713-23:45:09 main_test_accuracy 73 0.8610000610351562 20240713-23:45:10 test_accuracy 73 model 3 forward 478 / 506 backward 398 / 494 20240713-23:45:10 main_test_accuracy 73 0.8760000467300415 20240713-23:45:11 wrote gpt_004.pth 20240713-23:45:11 wrote gpt_003.pth 20240713-23:45:11 cache_w_quizzes contains 200000 quizzes 20240713-23:45:20 --- epoch 74 ---------------------------------------- 20240713-23:45:20 current_test_accuracies 0.8970 0.8830 0.8900 0.8760 0.8610 20240713-23:45:20 training model 4 20240713-23:45:20 training model 3 20240713-23:54:26 train_perplexity 74 model 4 1.1354700461126894 20240713-23:54:34 train_perplexity 74 model 3 1.1356570088069127 20240713-23:54:54 test_perplexity 74 model 4 1.1329749640908497 20240713-23:55:03 test_perplexity 74 model 3 1.1376205194346234 20240713-23:57:59 test_accuracy 74 model 4 forward 462 / 478 backward 425 / 522 20240713-23:57:59 main_test_accuracy 74 0.8870000243186951 20240713-23:58:00 test_accuracy 74 model 3 forward 478 / 506 backward 381 / 494 20240713-23:58:00 main_test_accuracy 74 0.859000027179718 20240713-23:58:01 wrote gpt_004.pth 20240713-23:58:01 wrote gpt_003.pth 20240713-23:58:01 cache_w_quizzes contains 200000 quizzes 20240713-23:58:10 --- epoch 75 ---------------------------------------- 20240713-23:58:10 current_test_accuracies 0.8970 0.8830 0.8900 0.8590 0.8870 20240713-23:58:10 training model 3 20240713-23:58:10 training model 1 20240714-00:07:16 train_perplexity 75 model 3 1.1355800859188736 20240714-00:07:25 train_perplexity 75 model 1 1.135204444845318 20240714-00:07:44 test_perplexity 75 model 3 1.134736994011979 20240714-00:07:53 test_perplexity 75 model 1 1.1327550122928844 20240714-00:10:48 test_accuracy 75 model 3 forward 486 / 506 backward 406 / 494 20240714-00:10:48 main_test_accuracy 75 0.8920000195503235 20240714-00:10:51 test_accuracy 75 model 1 forward 464 / 495 backward 420 / 505 20240714-00:10:51 main_test_accuracy 75 0.8840000629425049 20240714-00:10:52 wrote gpt_003.pth 20240714-00:10:52 wrote gpt_001.pth 20240714-00:10:52 cache_w_quizzes contains 200000 quizzes 20240714-00:11:00 --- epoch 76 ---------------------------------------- 20240714-00:11:00 current_test_accuracies 0.8970 0.8840 0.8900 0.8920 0.8870 20240714-00:11:00 training model 1 20240714-00:11:00 training model 4 20240714-00:20:05 train_perplexity 76 model 1 1.134890484467482 20240714-00:20:14 train_perplexity 76 model 4 1.1351459312277885 20240714-00:20:33 test_perplexity 76 model 1 1.132224294194242 20240714-00:20:43 test_perplexity 76 model 4 1.1328410474681452 20240714-00:23:39 test_accuracy 76 model 1 forward 461 / 495 backward 428 / 505 20240714-00:23:39 main_test_accuracy 76 0.8890000581741333 20240714-00:23:41 test_accuracy 76 model 4 forward 462 / 478 backward 441 / 522 20240714-00:23:41 main_test_accuracy 76 0.9030000567436218 20240714-00:23:42 wrote gpt_001.pth 20240714-00:23:42 wrote gpt_004.pth 20240714-00:23:42 cache_w_quizzes contains 200000 quizzes 20240714-00:23:51 --- epoch 77 ---------------------------------------- 20240714-00:23:51 current_test_accuracies 0.8970 0.8890 0.8900 0.8920 0.9030 20240714-00:23:51 training model 1 20240714-00:23:51 training model 2 20240714-00:32:56 train_perplexity 77 model 1 1.1348132001187394 20240714-00:33:05 train_perplexity 77 model 2 1.1347082335886416 20240714-00:33:24 test_perplexity 77 model 1 1.1329098446928816 20240714-00:33:34 test_perplexity 77 model 2 1.1330994056232708 20240714-00:36:29 test_accuracy 77 model 1 forward 464 / 495 backward 427 / 505 20240714-00:36:29 main_test_accuracy 77 0.8910000324249268 20240714-00:36:30 test_accuracy 77 model 2 forward 472 / 508 backward 398 / 492 20240714-00:36:30 main_test_accuracy 77 0.8700000643730164 20240714-00:36:31 wrote gpt_001.pth 20240714-00:36:31 wrote gpt_002.pth 20240714-00:36:31 cache_w_quizzes contains 200000 quizzes 20240714-00:36:40 --- epoch 78 ---------------------------------------- 20240714-00:36:40 current_test_accuracies 0.8970 0.8910 0.8700 0.8920 0.9030 20240714-00:36:40 training model 2 20240714-00:36:40 training model 1 20240714-00:45:45 train_perplexity 78 model 2 1.1347024589760613 20240714-00:45:54 train_perplexity 78 model 1 1.1347983824807562 20240714-00:46:13 test_perplexity 78 model 2 1.1328690534197075 20240714-00:46:23 test_perplexity 78 model 1 1.132671058370852 20240714-00:49:16 test_accuracy 78 model 2 forward 480 / 508 backward 405 / 492 20240714-00:49:16 main_test_accuracy 78 0.8850000500679016 20240714-00:49:19 test_accuracy 78 model 1 forward 463 / 495 backward 411 / 505 20240714-00:49:19 main_test_accuracy 78 0.8740000128746033 20240714-00:49:20 wrote gpt_002.pth 20240714-00:49:20 wrote gpt_001.pth 20240714-00:49:20 cache_w_quizzes contains 200000 quizzes 20240714-00:49:29 --- epoch 79 ---------------------------------------- 20240714-00:49:29 current_test_accuracies 0.8970 0.8740 0.8850 0.8920 0.9030 20240714-00:49:29 training model 1 20240714-00:49:29 training model 2 20240714-00:58:35 train_perplexity 79 model 1 1.1346454588733572 20240714-00:58:43 train_perplexity 79 model 2 1.1350156705558232 20240714-00:59:03 test_perplexity 79 model 1 1.132796225214968 20240714-00:59:12 test_perplexity 79 model 2 1.1323924048207819 20240714-01:02:07 test_accuracy 79 model 1 forward 465 / 495 backward 429 / 505 20240714-01:02:07 main_test_accuracy 79 0.8940000534057617 20240714-01:02:09 test_accuracy 79 model 2 forward 483 / 508 backward 417 / 492 20240714-01:02:09 main_test_accuracy 79 0.9000000357627869 20240714-01:02:10 wrote gpt_001.pth 20240714-01:02:10 wrote gpt_002.pth 20240714-01:02:10 cache_w_quizzes contains 200000 quizzes 20240714-01:02:18 --- epoch 80 ---------------------------------------- 20240714-01:02:18 current_test_accuracies 0.8970 0.8940 0.9000 0.8920 0.9030 20240714-01:02:18 training model 3 20240714-01:02:18 training model 1 20240714-01:11:23 train_perplexity 80 model 3 1.1353830486103647 20240714-01:11:32 train_perplexity 80 model 1 1.1344250343372575 20240714-01:11:51 test_perplexity 80 model 3 1.1343273103699691 20240714-01:12:01 test_perplexity 80 model 1 1.1322252935332429 20240714-01:14:54 test_accuracy 80 model 3 forward 475 / 506 backward 414 / 494 20240714-01:14:54 main_test_accuracy 80 0.8890000581741333 20240714-01:14:57 test_accuracy 80 model 1 forward 468 / 495 backward 437 / 505 20240714-01:14:57 main_test_accuracy 80 0.9050000309944153 20240714-01:14:58 wrote gpt_003.pth 20240714-01:14:58 wrote gpt_001.pth 20240714-01:14:58 cache_w_quizzes contains 200000 quizzes 20240714-01:15:07 --- epoch 81 ---------------------------------------- 20240714-01:15:07 current_test_accuracies 0.8970 0.9050 0.9000 0.8890 0.9030 20240714-01:15:07 training model 3 20240714-01:15:07 training model 0 20240714-01:24:13 train_perplexity 81 model 3 1.1353564977329167 20240714-01:24:21 train_perplexity 81 model 0 1.1355924075301156 20240714-01:24:41 test_perplexity 81 model 3 1.134526907500833 20240714-01:24:50 test_perplexity 81 model 0 1.1335027399821018 20240714-01:27:45 test_accuracy 81 model 3 forward 479 / 506 backward 405 / 494 20240714-01:27:45 main_test_accuracy 81 0.8840000629425049 20240714-01:27:47 test_accuracy 81 model 0 forward 476 / 504 backward 400 / 496 20240714-01:27:47 main_test_accuracy 81 0.8760000467300415 20240714-01:27:47 wrote gpt_003.pth 20240714-01:27:48 wrote gpt_000.pth 20240714-01:27:48 cache_w_quizzes contains 200000 quizzes 20240714-01:27:57 --- epoch 82 ---------------------------------------- 20240714-01:27:57 current_test_accuracies 0.8760 0.9050 0.9000 0.8840 0.9030 20240714-01:27:57 training model 0 20240714-01:27:57 training model 3 20240714-01:37:02 train_perplexity 82 model 0 1.1355302672052414 20240714-01:37:11 train_perplexity 82 model 3 1.1356230098563223 20240714-01:37:31 test_perplexity 82 model 0 1.1331832640182102 20240714-01:37:40 test_perplexity 82 model 3 1.1339973228503704 20240714-01:40:34 test_accuracy 82 model 0 forward 479 / 504 backward 399 / 496 20240714-01:40:34 main_test_accuracy 82 0.878000020980835 20240714-01:40:37 test_accuracy 82 model 3 forward 487 / 506 backward 407 / 494 20240714-01:40:37 main_test_accuracy 82 0.8940000534057617 20240714-01:40:38 wrote gpt_000.pth 20240714-01:40:38 wrote gpt_003.pth 20240714-01:40:38 cache_w_quizzes contains 200000 quizzes 20240714-01:40:47 --- epoch 83 ---------------------------------------- 20240714-01:40:47 current_test_accuracies 0.8780 0.9050 0.9000 0.8940 0.9030 20240714-01:40:47 training model 0 20240714-01:40:47 training model 3 20240714-01:49:52 train_perplexity 83 model 0 1.1350789986877077 20240714-01:50:01 train_perplexity 83 model 3 1.1347217112983377 20240714-01:50:20 test_perplexity 83 model 0 1.1336814439775944 20240714-01:50:30 test_perplexity 83 model 3 1.1344159775473062 20240714-01:53:23 test_accuracy 83 model 0 forward 468 / 504 backward 400 / 496 20240714-01:53:23 main_test_accuracy 83 0.8680000305175781 20240714-01:53:25 test_accuracy 83 model 3 forward 489 / 506 backward 415 / 494 20240714-01:53:25 main_test_accuracy 83 0.9040000438690186 20240714-01:53:26 wrote gpt_000.pth 20240714-01:53:26 wrote gpt_003.pth 20240714-01:53:26 cache_w_quizzes contains 200000 quizzes 20240714-01:53:35 --- epoch 84 ---------------------------------------- 20240714-01:53:35 current_test_accuracies 0.8680 0.9050 0.9000 0.9040 0.9030 20240714-01:53:35 training model 0 20240714-01:53:35 training model 2 20240714-02:02:41 train_perplexity 84 model 0 1.1352208374787374 20240714-02:02:50 train_perplexity 84 model 2 1.1345723298563426 20240714-02:03:08 test_perplexity 84 model 0 1.1333394661860003 20240714-02:03:18 test_perplexity 84 model 2 1.1328954048566588 20240714-02:06:13 test_accuracy 84 model 0 forward 475 / 504 backward 404 / 496 20240714-02:06:13 main_test_accuracy 84 0.8790000677108765 20240714-02:06:15 test_accuracy 84 model 2 forward 474 / 508 backward 407 / 492 20240714-02:06:15 main_test_accuracy 84 0.8810000419616699 20240714-02:06:16 wrote gpt_000.pth 20240714-02:06:16 wrote gpt_002.pth 20240714-02:06:16 cache_w_quizzes contains 200000 quizzes 20240714-02:06:25 --- epoch 85 ---------------------------------------- 20240714-02:06:25 current_test_accuracies 0.8790 0.9050 0.8810 0.9040 0.9030 20240714-02:06:25 training model 0 20240714-02:06:25 training model 2 20240714-02:15:30 train_perplexity 85 model 0 1.1353205007880272 20240714-02:15:38 train_perplexity 85 model 2 1.1343734372412055 20240714-02:15:58 test_perplexity 85 model 0 1.1332377330886763 20240714-02:16:07 test_perplexity 85 model 2 1.1325705518679998 20240714-02:19:01 test_accuracy 85 model 0 forward 486 / 504 backward 426 / 496 20240714-02:19:01 main_test_accuracy 85 0.9120000600814819 20240714-02:19:03 test_accuracy 85 model 2 forward 483 / 508 backward 421 / 492 20240714-02:19:03 main_test_accuracy 85 0.9040000438690186 20240714-02:19:04 wrote gpt_000.pth 20240714-02:19:04 wrote gpt_002.pth 20240714-02:19:04 cache_w_quizzes contains 200000 quizzes 20240714-02:26:57 keep c_quizzes model 3 nb_accumulated 10 / 2200 20240714-02:33:59 keep c_quizzes model 3 nb_accumulated 22 / 2200 20240714-02:39:12 keep c_quizzes model 2 nb_accumulated 35 / 2200 20240714-02:44:25 keep c_quizzes model 1 nb_accumulated 44 / 2200 20240714-02:49:39 keep c_quizzes model 3 nb_accumulated 58 / 2200 20240714-02:54:55 keep c_quizzes model 0 nb_accumulated 67 / 2200 20240714-03:00:09 keep c_quizzes model 2 nb_accumulated 78 / 2200 20240714-03:05:22 keep c_quizzes model 3 nb_accumulated 87 / 2200 20240714-03:10:35 keep c_quizzes model 2 nb_accumulated 99 / 2200 20240714-03:15:48 keep c_quizzes model 3 nb_accumulated 110 / 2200 20240714-03:21:01 keep c_quizzes model 3 nb_accumulated 129 / 2200 20240714-03:26:15 keep c_quizzes model 2 nb_accumulated 137 / 2200 20240714-03:31:29 keep c_quizzes model 4 nb_accumulated 143 / 2200 20240714-03:36:42 keep c_quizzes model 1 nb_accumulated 152 / 2200 20240714-03:41:55 keep c_quizzes model 3 nb_accumulated 162 / 2200 20240714-03:47:10 keep c_quizzes model 2 nb_accumulated 171 / 2200 20240714-03:52:23 keep c_quizzes model 3 nb_accumulated 186 / 2200 20240714-03:57:40 keep c_quizzes model 0 nb_accumulated 197 / 2200 20240714-04:02:55 keep c_quizzes model 4 nb_accumulated 205 / 2200 20240714-04:08:09 keep c_quizzes model 4 nb_accumulated 213 / 2200 20240714-04:13:21 keep c_quizzes model 3 nb_accumulated 229 / 2200 20240714-04:18:36 keep c_quizzes model 0 nb_accumulated 236 / 2200 20240714-04:23:49 keep c_quizzes model 1 nb_accumulated 248 / 2200 20240714-04:29:05 keep c_quizzes model 0 nb_accumulated 257 / 2200 20240714-04:34:18 keep c_quizzes model 3 nb_accumulated 273 / 2200 20240714-04:39:32 keep c_quizzes model 2 nb_accumulated 283 / 2200 20240714-04:44:46 keep c_quizzes model 4 nb_accumulated 286 / 2200 20240714-04:50:00 keep c_quizzes model 2 nb_accumulated 296 / 2200 20240714-04:55:13 keep c_quizzes model 3 nb_accumulated 307 / 2200 20240714-05:00:26 keep c_quizzes model 3 nb_accumulated 321 / 2200 20240714-05:05:39 keep c_quizzes model 1 nb_accumulated 336 / 2200 20240714-05:10:54 keep c_quizzes model 3 nb_accumulated 350 / 2200 20240714-05:16:07 keep c_quizzes model 3 nb_accumulated 362 / 2200 20240714-05:21:22 keep c_quizzes model 2 nb_accumulated 375 / 2200 20240714-05:26:36 keep c_quizzes model 2 nb_accumulated 385 / 2200 20240714-05:31:49 keep c_quizzes model 2 nb_accumulated 398 / 2200 20240714-05:37:03 keep c_quizzes model 3 nb_accumulated 410 / 2200 20240714-05:42:17 keep c_quizzes model 4 nb_accumulated 412 / 2200 20240714-05:47:31 keep c_quizzes model 4 nb_accumulated 419 / 2200 20240714-05:52:47 keep c_quizzes model 2 nb_accumulated 427 / 2200 20240714-05:58:02 keep c_quizzes model 3 nb_accumulated 437 / 2200 20240714-06:03:16 keep c_quizzes model 1 nb_accumulated 446 / 2200 20240714-06:08:30 keep c_quizzes model 2 nb_accumulated 464 / 2200 20240714-06:13:45 keep c_quizzes model 0 nb_accumulated 467 / 2200 20240714-06:18:58 keep c_quizzes model 3 nb_accumulated 475 / 2200 20240714-06:24:11 keep c_quizzes model 1 nb_accumulated 481 / 2200 20240714-06:29:25 keep c_quizzes model 4 nb_accumulated 489 / 2200 20240714-06:34:41 keep c_quizzes model 0 nb_accumulated 497 / 2200 20240714-06:39:55 keep c_quizzes model 0 nb_accumulated 512 / 2200 20240714-06:45:09 keep c_quizzes model 2 nb_accumulated 527 / 2200 20240714-06:50:23 keep c_quizzes model 4 nb_accumulated 539 / 2200 20240714-06:55:37 keep c_quizzes model 0 nb_accumulated 547 / 2200 20240714-07:00:50 keep c_quizzes model 2 nb_accumulated 557 / 2200 20240714-07:06:03 keep c_quizzes model 3 nb_accumulated 568 / 2200 20240714-07:11:16 keep c_quizzes model 3 nb_accumulated 578 / 2200 20240714-07:16:31 keep c_quizzes model 0 nb_accumulated 583 / 2200 20240714-07:21:46 keep c_quizzes model 0 nb_accumulated 595 / 2200 20240714-07:27:01 keep c_quizzes model 0 nb_accumulated 607 / 2200 20240714-07:32:15 keep c_quizzes model 2 nb_accumulated 611 / 2200 20240714-07:37:31 keep c_quizzes model 0 nb_accumulated 620 / 2200 20240714-07:42:46 keep c_quizzes model 1 nb_accumulated 626 / 2200 20240714-07:48:00 keep c_quizzes model 3 nb_accumulated 638 / 2200 20240714-07:53:14 keep c_quizzes model 1 nb_accumulated 644 / 2200 20240714-07:58:29 keep c_quizzes model 0 nb_accumulated 655 / 2200 20240714-08:03:45 keep c_quizzes model 0 nb_accumulated 667 / 2200 20240714-08:09:02 keep c_quizzes model 4 nb_accumulated 675 / 2200 20240714-08:14:16 keep c_quizzes model 1 nb_accumulated 684 / 2200 20240714-08:19:31 keep c_quizzes model 4 nb_accumulated 689 / 2200 20240714-08:24:45 keep c_quizzes model 2 nb_accumulated 698 / 2200 20240714-08:29:59 keep c_quizzes model 2 nb_accumulated 702 / 2200 20240714-08:35:13 keep c_quizzes model 2 nb_accumulated 721 / 2200 20240714-08:40:27 keep c_quizzes model 2 nb_accumulated 729 / 2200 20240714-08:45:41 keep c_quizzes model 3 nb_accumulated 738 / 2200 20240714-08:50:55 keep c_quizzes model 2 nb_accumulated 743 / 2200 20240714-08:56:10 keep c_quizzes model 2 nb_accumulated 748 / 2200 20240714-09:01:23 keep c_quizzes model 2 nb_accumulated 758 / 2200 20240714-09:06:38 keep c_quizzes model 0 nb_accumulated 766 / 2200 20240714-09:11:53 keep c_quizzes model 4 nb_accumulated 768 / 2200 20240714-09:17:06 keep c_quizzes model 3 nb_accumulated 780 / 2200 20240714-09:22:20 keep c_quizzes model 2 nb_accumulated 792 / 2200 20240714-09:27:36 keep c_quizzes model 0 nb_accumulated 800 / 2200 20240714-09:32:51 keep c_quizzes model 0 nb_accumulated 808 / 2200 20240714-09:38:04 keep c_quizzes model 3 nb_accumulated 817 / 2200 20240714-09:43:20 keep c_quizzes model 4 nb_accumulated 823 / 2200 20240714-09:48:35 keep c_quizzes model 0 nb_accumulated 830 / 2200 20240714-09:53:50 keep c_quizzes model 0 nb_accumulated 838 / 2200 20240714-09:59:04 keep c_quizzes model 2 nb_accumulated 851 / 2200 20240714-10:04:19 keep c_quizzes model 2 nb_accumulated 856 / 2200 20240714-10:09:33 keep c_quizzes model 2 nb_accumulated 867 / 2200 20240714-10:14:47 keep c_quizzes model 4 nb_accumulated 877 / 2200 20240714-10:20:01 keep c_quizzes model 2 nb_accumulated 882 / 2200 20240714-10:25:16 keep c_quizzes model 0 nb_accumulated 886 / 2200 20240714-10:30:31 keep c_quizzes model 3 nb_accumulated 891 / 2200 20240714-10:35:47 keep c_quizzes model 3 nb_accumulated 898 / 2200 20240714-10:41:04 keep c_quizzes model 1 nb_accumulated 910 / 2200 20240714-10:46:21 keep c_quizzes model 0 nb_accumulated 917 / 2200 20240714-10:51:36 keep c_quizzes model 1 nb_accumulated 930 / 2200 20240714-10:56:53 keep c_quizzes model 4 nb_accumulated 933 / 2200 20240714-11:02:10 keep c_quizzes model 0 nb_accumulated 941 / 2200 20240714-11:07:27 keep c_quizzes model 0 nb_accumulated 954 / 2200 20240714-11:12:42 keep c_quizzes model 3 nb_accumulated 961 / 2200 20240714-11:17:59 keep c_quizzes model 0 nb_accumulated 969 / 2200 20240714-11:23:15 keep c_quizzes model 2 nb_accumulated 981 / 2200 20240714-11:28:32 keep c_quizzes model 0 nb_accumulated 987 / 2200 20240714-11:33:48 keep c_quizzes model 2 nb_accumulated 996 / 2200 20240714-11:39:05 keep c_quizzes model 4 nb_accumulated 1006 / 2200 20240714-11:44:23 keep c_quizzes model 0 nb_accumulated 1015 / 2200 20240714-11:49:37 keep c_quizzes model 4 nb_accumulated 1022 / 2200 20240714-11:54:50 keep c_quizzes model 2 nb_accumulated 1035 / 2200 20240714-12:00:03 keep c_quizzes model 3 nb_accumulated 1048 / 2200 20240714-12:05:18 keep c_quizzes model 0 nb_accumulated 1056 / 2200 20240714-12:10:33 keep c_quizzes model 4 nb_accumulated 1061 / 2200 20240714-12:15:48 keep c_quizzes model 4 nb_accumulated 1068 / 2200 20240714-12:21:02 keep c_quizzes model 1 nb_accumulated 1084 / 2200 20240714-12:26:20 keep c_quizzes model 3 nb_accumulated 1091 / 2200 20240714-12:31:36 keep c_quizzes model 4 nb_accumulated 1095 / 2200 20240714-12:36:50 keep c_quizzes model 1 nb_accumulated 1109 / 2200 20240714-12:42:04 keep c_quizzes model 4 nb_accumulated 1114 / 2200 20240714-12:47:25 keep c_quizzes model 0 nb_accumulated 1123 / 2200 20240714-12:52:40 keep c_quizzes model 0 nb_accumulated 1138 / 2200 20240714-12:57:54 keep c_quizzes model 4 nb_accumulated 1145 / 2200 20240714-13:03:08 keep c_quizzes model 2 nb_accumulated 1154 / 2200 20240714-13:08:21 keep c_quizzes model 3 nb_accumulated 1165 / 2200 20240714-13:13:35 keep c_quizzes model 1 nb_accumulated 1177 / 2200 20240714-13:18:50 keep c_quizzes model 4 nb_accumulated 1186 / 2200 20240714-13:24:04 keep c_quizzes model 3 nb_accumulated 1193 / 2200 20240714-13:29:18 keep c_quizzes model 4 nb_accumulated 1200 / 2200 20240714-13:34:34 keep c_quizzes model 4 nb_accumulated 1207 / 2200 20240714-13:39:50 keep c_quizzes model 4 nb_accumulated 1212 / 2200 20240714-13:45:05 keep c_quizzes model 1 nb_accumulated 1218 / 2200 20240714-13:50:19 keep c_quizzes model 4 nb_accumulated 1220 / 2200 20240714-13:55:33 keep c_quizzes model 3 nb_accumulated 1235 / 2200 20240714-14:00:48 keep c_quizzes model 4 nb_accumulated 1243 / 2200 20240714-14:06:02 keep c_quizzes model 0 nb_accumulated 1255 / 2200 20240714-14:11:17 keep c_quizzes model 0 nb_accumulated 1263 / 2200 20240714-14:16:30 keep c_quizzes model 3 nb_accumulated 1268 / 2200 20240714-14:21:43 keep c_quizzes model 2 nb_accumulated 1281 / 2200 20240714-14:26:59 keep c_quizzes model 0 nb_accumulated 1291 / 2200 20240714-14:32:11 keep c_quizzes model 3 nb_accumulated 1303 / 2200 20240714-14:37:25 keep c_quizzes model 1 nb_accumulated 1315 / 2200 20240714-14:42:40 keep c_quizzes model 0 nb_accumulated 1325 / 2200 20240714-14:47:55 keep c_quizzes model 4 nb_accumulated 1331 / 2200 20240714-14:53:08 keep c_quizzes model 1 nb_accumulated 1340 / 2200 20240714-14:58:23 keep c_quizzes model 0 nb_accumulated 1349 / 2200 20240714-15:03:37 keep c_quizzes model 4 nb_accumulated 1353 / 2200 20240714-15:08:51 keep c_quizzes model 1 nb_accumulated 1369 / 2200 20240714-15:14:05 keep c_quizzes model 2 nb_accumulated 1378 / 2200 20240714-15:19:19 keep c_quizzes model 3 nb_accumulated 1392 / 2200 20240714-15:24:32 keep c_quizzes model 1 nb_accumulated 1406 / 2200 20240714-15:29:45 keep c_quizzes model 2 nb_accumulated 1415 / 2200 20240714-15:34:58 keep c_quizzes model 3 nb_accumulated 1422 / 2200 20240714-15:40:13 keep c_quizzes model 0 nb_accumulated 1426 / 2200 20240714-15:45:27 keep c_quizzes model 2 nb_accumulated 1439 / 2200 20240714-15:50:41 keep c_quizzes model 0 nb_accumulated 1449 / 2200 20240714-15:55:54 keep c_quizzes model 3 nb_accumulated 1459 / 2200 20240714-16:01:08 keep c_quizzes model 1 nb_accumulated 1470 / 2200 20240714-16:06:22 keep c_quizzes model 1 nb_accumulated 1481 / 2200 20240714-16:11:36 keep c_quizzes model 4 nb_accumulated 1489 / 2200 20240714-16:16:50 keep c_quizzes model 2 nb_accumulated 1499 / 2200 20240714-16:22:05 keep c_quizzes model 0 nb_accumulated 1508 / 2200 20240714-16:27:18 keep c_quizzes model 3 nb_accumulated 1519 / 2200 20240714-16:32:31 keep c_quizzes model 2 nb_accumulated 1533 / 2200 20240714-16:37:45 keep c_quizzes model 4 nb_accumulated 1541 / 2200 20240714-16:42:58 keep c_quizzes model 3 nb_accumulated 1550 / 2200 20240714-16:48:13 keep c_quizzes model 0 nb_accumulated 1557 / 2200 20240714-16:53:26 keep c_quizzes model 3 nb_accumulated 1576 / 2200 20240714-16:58:40 keep c_quizzes model 2 nb_accumulated 1594 / 2200 20240714-17:03:56 keep c_quizzes model 0 nb_accumulated 1599 / 2200 20240714-17:09:09 keep c_quizzes model 3 nb_accumulated 1611 / 2200 20240714-17:14:22 keep c_quizzes model 1 nb_accumulated 1618 / 2200 20240714-17:19:36 keep c_quizzes model 2 nb_accumulated 1628 / 2200 20240714-17:24:49 keep c_quizzes model 2 nb_accumulated 1634 / 2200 20240714-17:30:03 keep c_quizzes model 4 nb_accumulated 1641 / 2200 20240714-17:35:17 keep c_quizzes model 3 nb_accumulated 1650 / 2200 20240714-17:40:32 keep c_quizzes model 0 nb_accumulated 1655 / 2200 20240714-17:45:46 keep c_quizzes model 2 nb_accumulated 1665 / 2200 20240714-17:51:00 keep c_quizzes model 4 nb_accumulated 1668 / 2200 20240714-17:56:14 keep c_quizzes model 3 nb_accumulated 1683 / 2200 20240714-18:01:28 keep c_quizzes model 2 nb_accumulated 1689 / 2200 20240714-18:06:40 keep c_quizzes model 3 nb_accumulated 1705 / 2200 20240714-18:11:55 keep c_quizzes model 0 nb_accumulated 1716 / 2200 20240714-18:17:07 keep c_quizzes model 1 nb_accumulated 1728 / 2200 20240714-18:22:22 keep c_quizzes model 4 nb_accumulated 1733 / 2200 20240714-18:27:35 keep c_quizzes model 2 nb_accumulated 1740 / 2200 20240714-18:32:49 keep c_quizzes model 0 nb_accumulated 1748 / 2200 20240714-18:38:04 keep c_quizzes model 0 nb_accumulated 1756 / 2200 20240714-18:43:18 keep c_quizzes model 0 nb_accumulated 1762 / 2200 20240714-18:48:32 keep c_quizzes model 4 nb_accumulated 1766 / 2200 20240714-18:53:46 keep c_quizzes model 1 nb_accumulated 1780 / 2200 20240714-18:58:59 keep c_quizzes model 4 nb_accumulated 1782 / 2200 20240714-19:04:16 keep c_quizzes model 0 nb_accumulated 1792 / 2200 20240714-19:09:31 keep c_quizzes model 0 nb_accumulated 1802 / 2200 20240714-19:14:45 keep c_quizzes model 4 nb_accumulated 1808 / 2200 20240714-19:19:59 keep c_quizzes model 4 nb_accumulated 1815 / 2200 20240714-19:25:13 keep c_quizzes model 4 nb_accumulated 1822 / 2200 20240714-19:30:27 keep c_quizzes model 2 nb_accumulated 1833 / 2200 20240714-19:35:41 keep c_quizzes model 4 nb_accumulated 1842 / 2200 20240714-19:40:56 keep c_quizzes model 4 nb_accumulated 1849 / 2200 20240714-19:46:09 keep c_quizzes model 2 nb_accumulated 1859 / 2200 20240714-19:51:22 keep c_quizzes model 2 nb_accumulated 1870 / 2200 20240714-19:56:37 keep c_quizzes model 2 nb_accumulated 1877 / 2200 20240714-20:01:52 keep c_quizzes model 0 nb_accumulated 1888 / 2200 20240714-20:07:06 keep c_quizzes model 1 nb_accumulated 1897 / 2200 20240714-20:12:19 keep c_quizzes model 3 nb_accumulated 1907 / 2200 20240714-20:17:32 keep c_quizzes model 2 nb_accumulated 1918 / 2200 20240714-20:22:45 keep c_quizzes model 2 nb_accumulated 1932 / 2200 20240714-20:27:59 keep c_quizzes model 4 nb_accumulated 1937 / 2200 20240714-20:33:14 keep c_quizzes model 0 nb_accumulated 1945 / 2200 20240714-20:38:28 keep c_quizzes model 4 nb_accumulated 1957 / 2200 20240714-20:43:43 keep c_quizzes model 0 nb_accumulated 1966 / 2200 20240714-20:48:56 keep c_quizzes model 3 nb_accumulated 1972 / 2200 20240714-20:54:11 keep c_quizzes model 4 nb_accumulated 1981 / 2200 20240714-20:59:24 keep c_quizzes model 3 nb_accumulated 1992 / 2200 20240714-21:04:38 keep c_quizzes model 4 nb_accumulated 2002 / 2200 20240714-21:09:51 keep c_quizzes model 1 nb_accumulated 2012 / 2200 20240714-21:15:06 keep c_quizzes model 0 nb_accumulated 2023 / 2200 20240714-21:20:21 keep c_quizzes model 0 nb_accumulated 2035 / 2200 20240714-21:25:34 keep c_quizzes model 3 nb_accumulated 2048 / 2200 20240714-21:30:49 keep c_quizzes model 1 nb_accumulated 2057 / 2200 20240714-21:36:03 keep c_quizzes model 1 nb_accumulated 2063 / 2200 20240714-21:41:17 keep c_quizzes model 3 nb_accumulated 2078 / 2200 20240714-21:46:31 keep c_quizzes model 1 nb_accumulated 2086 / 2200 20240714-21:51:46 keep c_quizzes model 1 nb_accumulated 2099 / 2200 20240714-21:57:01 keep c_quizzes model 4 nb_accumulated 2105 / 2200 20240714-22:02:15 keep c_quizzes model 3 nb_accumulated 2115 / 2200 20240714-22:07:29 keep c_quizzes model 4 nb_accumulated 2120 / 2200 20240714-22:12:42 keep c_quizzes model 3 nb_accumulated 2134 / 2200 20240714-22:17:56 keep c_quizzes model 4 nb_accumulated 2144 / 2200 20240714-22:23:10 keep c_quizzes model 1 nb_accumulated 2159 / 2200 20240714-22:28:24 keep c_quizzes model 2 nb_accumulated 2167 / 2200 20240714-22:33:43 keep c_quizzes model 4 nb_accumulated 2176 / 2200 20240714-22:39:18 keep c_quizzes model 2 nb_accumulated 2189 / 2200 20240714-22:44:34 keep c_quizzes model 4 nb_accumulated 2199 / 2200 20240714-22:49:51 keep c_quizzes model 0 nb_accumulated 2208 / 2200 20240714-22:49:52 --- epoch 86 ---------------------------------------- 20240714-22:49:52 current_test_accuracies 0.9120 0.9050 0.9040 0.9040 0.9030 20240714-22:49:52 training model 4 20240714-22:49:52 training model 2 20240714-22:58:55 train_perplexity 86 model 4 1.1366365958983131 20240714-22:59:03 train_perplexity 86 model 2 1.1356696751874478 20240714-22:59:24 test_perplexity 86 model 4 1.1339929771894497 20240714-22:59:33 test_perplexity 86 model 2 1.1332778324924047 20240714-23:02:28 test_accuracy 86 model 4 forward 460 / 478 backward 440 / 522 20240714-23:02:28 main_test_accuracy 86 0.9000000357627869 20240714-23:02:29 test_accuracy 86 model 2 forward 488 / 508 backward 427 / 492 20240714-23:02:29 main_test_accuracy 86 0.9150000214576721 20240714-23:02:30 wrote gpt_004.pth 20240714-23:02:30 wrote gpt_002.pth 20240714-23:02:30 cache_w_quizzes contains 200000 quizzes 20240714-23:10:26 keep c_quizzes model 0 nb_accumulated 5 / 2200 20240714-23:17:31 keep c_quizzes model 1 nb_accumulated 10 / 2200 20240714-23:22:45 keep c_quizzes model 3 nb_accumulated 19 / 2200 20240714-23:28:00 keep c_quizzes model 1 nb_accumulated 27 / 2200 20240714-23:33:14 keep c_quizzes model 2 nb_accumulated 29 / 2200 20240714-23:38:27 keep c_quizzes model 2 nb_accumulated 32 / 2200 20240714-23:43:40 keep c_quizzes model 2 nb_accumulated 37 / 2200 20240714-23:48:56 keep c_quizzes model 3 nb_accumulated 43 / 2200 20240714-23:54:10 keep c_quizzes model 1 nb_accumulated 55 / 2200 20240714-23:59:26 keep c_quizzes model 4 nb_accumulated 55 / 2200 20240715-00:04:41 keep c_quizzes model 1 nb_accumulated 63 / 2200