Camelyon16
The Camelyon16 dataset consists of 400 WSIs of lymph nodes for breast cancer metastasis classification. The dataset is a combination of two independent datasets, collected from two separate medical centers in the Netherlands (Radboud University Medical Center and University Medical Center Utrecht). The dataset contains the slides from which PatchCamelyon-patches were extracted.
The dataset is divided in a train set (270 slides) and test set (130 slides), both containing images from both centers. Note that one test set slide was a duplicate has been removed (see here).
The task was part of Grand Challenge in 2016 and has later been replaced by Camelyon17.
Source: https://camelyon16.grand-challenge.org
Raw data
Key stats
Modality | Vision (WSI) |
Task | Binary classification |
Cancer type | Breast |
Data size | ~700 GB |
Image dimension | ~100-250k x ~100-250k x 3 |
Magnification (μm/px) | 40x (0.25) - Level 0 |
Files format | .tif |
Number of images | 399 (270 train, 129 test) |
Organization
The data CAMELYON16
(download links here) is organized as follows:
CAMELYON16
├── training
│ ├── normal
| │ ├── normal_001.tif
| │ └── ...
│ ├── tumor
| │ ├── tumor_001.tif
| │ └── ...
│ └── lesion_annotations.zip
├── testing
│ ├── images
| │ ├── test_001.tif
| │ └── ...
│ ├── evaluation # masks not in use
│ ├── reference.csv # targets
│ └── lesion_annotations.zip
Download and preprocessing
The Camelyon16
dataset class doesn't download the data during runtime and must be downloaded manually from links provided here.
The dataset is split into train / test. Additionally, we split the train set into train/val using the same splits as PatchCamelyon (see metadata CSV files on Zenodo).
Splits | Train | Validation | Test |
---|---|---|---|
#Samples | 216 (54.1%) | 54 (13.5%) | 129 (32.3%) |
Relevant links
References
1 : A General-Purpose Self-Supervised Model for Computational Pathology