+Takes ~1h10min on a 4090.
+
+======================================================================
+For the arithmetic expressions experiments
+
+# 38M parameters / 250k samples
+
+./main.py --task=expr
+
+# 352M parameters / 2.5M samples, reaches 99.80% after 12 epochs, the
+ learning rate schedule is obviously terrible