Skip to content

Camelyon16

The Camelyon16 dataset consists of 400 WSIs of lymph nodes for breast cancer metastasis classification. The dataset is a combination of two independent datasets, collected from two separate medical centers in the Netherlands (Radboud University Medical Center and University Medical Center Utrecht). The dataset contains the slides from which PatchCamelyon-patches were extracted.

The dataset is divided in a train set (270 slides) and test set (130 slides), both containing images from both centers. Note that one test set slide was a duplicate has been removed (see here).

The task was part of Grand Challenge in 2016 and has later been replaced by Camelyon17.

Source: https://camelyon16.grand-challenge.org

Raw data

Key stats

Modality Vision (WSI)
Task Binary classification
Cancer type Breast
Data size ~700 GB
Image dimension ~100-250k x ~100-250k x 3
Magnification (μm/px) 40x (0.25) - Level 0
Files format .tif
Number of images 399 (270 train, 129 test)

Organization

The data CAMELYON16 (download links here) is organized as follows:

CAMELYON16
├── training
│   ├── normal
|   │   ├── normal_001.tif
|   │   └── ...
│   ├── tumor
|   │   ├── tumor_001.tif
|   │   └── ...
│   └── lesion_annotations.zip
├── testing
│   ├── images
|   │   ├── test_001.tif
|   │   └── ...
│   ├── evaluation     # masks not in use
│   ├── reference.csv  # targets
│   └── lesion_annotations.zip

Download and preprocessing

The Camelyon16 dataset class doesn't download the data during runtime and must be downloaded manually from links provided here.

The dataset is split into train / test. Additionally, we split the train set into train/val using the same splits as PatchCamelyon (see metadata CSV files on Zenodo).

Splits Train Validation Test
#Samples 216 (54.1%) 54 (13.5%) 129 (32.3%)

References

1 : A General-Purpose Self-Supervised Model for Computational Pathology