Gleason (Arvaniti)

Benchmark dataset for automated Gleason grading of prostate cancer tissue microarrays via deep learning as proposed by Arvaniti et al..

Images are classified as benign, Gleason pattern 3, 4 or 5. The dataset contains annotations on a discovery / train cohort of 641 patients and an independent test cohort of 245 patients annotated by two pathologists. For the test cohort, we only use the labels from pathologist Nr. 1 for this benchmark

Raw data

Key stats


Modality	Vision (WSI patches)
Task	Multiclass classification (4 classes)
Cancer type	Prostate
Data size	4 GB
Image dimension	750 x 750
Magnification (μm/px)	40x (0.23)
Files format	`jpg`
Number of images	22,752

Splits

The following splits are proposed in the paper:

Splits	Train	Validation	Test
#Samples	15,303 (67.26%)	2,482 (10.91%)	4,967 (21.83%)

Note that the authors chose TMA 76 as validation cohort because it contains the most balanced distribution of Gleason scores. We couldn't achieve stable results when evaluating on the test set, so we only use the train and validation sets for this benchmark.

Download and preprocessing

The GleasonArvaniti dataset class doesn't download the data during runtime and must be downloaded and preprocessed manually:

Download dataset archives from the official source
Unpack all .tar.gz archives into the same folder
Adjust the folder structure and then run the create_patches.py from https://github.com/eiriniar/gleason_CNN/tree/master

This should result in the folloing folder structure:

arvaniti_gleason_patches
├── test_patches_750
│   ├── patho_1
│   │   ├── ZT80_38_A_1_1
    │   │   ├── ZT76_39_A_1_1_patch_12_class_0.jpg
    │   │   ├── ZT76_39_A_1_1_patch_23_class_0.jpg
│   │   │   └── ...
│   │   ├── ZT80_38_A_1_2
│   │   │   └── ...
│   │   └── ...
│   ├── patho_2  # we don't use this
│   │   └── ...
├── train_validation_patches_750
│   ├── ZT76_39_A_1_1
│   │   ├── ZT76_39_A_1_1_patch_12_class_0.jpg
│   │   ├── ZT76_39_A_1_1_patch_23_class_0.jpg
│   │   └── ...
│   ├── ZT76_39_A_1_2
│   └── ...

Relevant links

License

CC0 1.0 Universal