This is a port of the Synthetic Visual Reasoning Test problems to the pytorch framework, with an implementation of two convolutional networks to solve them.

Installation and test


make -j -k

should generate an image example.png in the current directory.

Note that the image generation does not take advantage of GPUs or multi-core, and can be as fast as 10,000 vignettes per second and as slow as 40 on a 4GHz i7-6700K.

Vignette generation and compression

Vignette sets

The file implements the classes VignetteSet and CompressedVignetteSet both with a constructor

__init__(problem_number, nb_samples, batch_size, cuda = False, logger = None)

and a method

(torch.FloatTensor, torch.LongTensor) get_batch(b)

which returns a pair composed of a 4d 'input' Tensor (i.e. single channel 128x128 images), and a 1d 'target' Tensor (i.e. Boolean labels).

Low-level functions

The main function for generating vignettes is

torch.ByteTensor svrt.generate_vignettes(int problem_number, torch.LongTensor labels)


The returned ByteTensor has three dimensions:

The two additional functions

torch.ByteStorage svrt.compress(torch.ByteStorage x)


torch.ByteStorage svrt.uncompress(torch.ByteStorage x)

provide a lossless compression scheme adapted to the ByteStorage of the vignette ByteTensor (i.e. expecting a lot of 255s, a few 0s, and no other value).

This compression reduces the memory footprint by a factor ~50, and may be usefull to deal with very large data-sets and avoid re-generating images at every batch. It induces a little overhead for decompression, and moving from CPU to GPU memory.

See for a class CompressedVignetteSet using it.

Testing convolution networks

The file provides the implementation of two deep networks designed by Afroze Baqapuri during an internship at Idiap, and allows to train them with several millions vignettes on a PC with 16Gb and a GPU with 8Gb.