Vissl

screenshot of Vissl

VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.

Overview

VISSL is a computer vision library for state-of-the-art Self-Supervised Learning research with PyTorch. It aims to accelerate the research cycle in self-supervised learning, from designing new tasks to evaluating learned representations. VISSL offers a reproducible implementation of various state-of-the-art self-supervised algorithms and supports supervised training as well. It provides a benchmark suite with a variety of tasks, including linear image classification, full finetuning, semi-supervised benchmark, nearest neighbor benchmark, and object detection. The library is designed to be easy to use, with a YAML configuration system based on Hydra. It also offers modularity, allowing users to design new tasks and reuse existing components from other tasks. VISSL is scalable and supports training on single GPU, multi-GPU, and multi-node setups.

Features

  • Reproducible implementation of SOTA in Self-Supervision: VISSL includes implementations of various state-of-the-art self-supervised algorithms, such as SwAV, SimCLR, MoCo(v2), PIRL, NPID, NPID++, DeepClusterV2, ClusterFit, RotNet, and Jigsaw. It also supports supervised trainings.
  • Benchmark suite: The library provides a suite of benchmark tasks, including linear image classification on datasets like places205, imagenet1k, voc07, food, CLEVR, dsprites, UCF101, and stanford cars. It also offers benchmarks for full finetuning, semi-supervised learning, nearest neighbor, and object detection on Pascal VOC and COCO datasets.
  • Ease of Usability: VISSL is designed to be user-friendly, with an easy-to-use YAML configuration system based on Hydra. Users can easily configure their experiments using simple YAML configuration files.
  • Modular: The library offers modularity, allowing users to design new tasks and easily reuse existing components from other tasks. Users can easily swap objective functions, model trunks and heads, data transforms, and other components in their configuration files.
  • Scalability: VISSL supports training models on various hardware configurations, including 1-GPU, multi-GPU, and multi-node setups. It provides several components for large-scale training, such as activation checkpointing, ZeRO optimization, mixed-precision training (FP16), LARC (Layer-wise Adaptive Rate Scaling), stateful data sampler, handling of invalid images, and large model backbones like RegNets.