Gleason (Arvaniti)
Benchmark dataset for automated Gleason grading of prostate cancer tissue microarrays via deep learning as proposed by Arvaniti et al..
Images are classified as benign, Gleason pattern 3, 4 or 5. The dataset contains annotations on a discovery / train cohort of 641 patients and an independent test cohort of 245 patients annotated by two pathologists. For the test cohort, we only use the labels from pathologist Nr. 1 for this benchmark
Raw data
Key stats
Modality | Vision (WSI patches) |
Task | Multiclass classification (4 classes) |
Cancer type | Prostate |
Data size | 4 GB |
Image dimension | 750 x 750 |
Magnification (μm/px) | 40x (0.23) |
Files format | jpg |
Number of images | 22,752 |
Splits
The following splits are proposed in the paper:
Splits | Train | Validation | Test |
---|---|---|---|
#Samples | 15,303 (67.26%) | 2,482 (10.91%) | 4,967 (21.83%) |
Note that the authors chose TMA 76 as validation cohort because it contains the most balanced distribution of Gleason scores. We couldn't achieve stable results when evaluating on the test set, so we only use the train and validation sets for this benchmark.
Download and preprocessing
The GleasonArvaniti
dataset class doesn't download the data during runtime and must be downloaded and preprocessed manually:
- Download dataset archives from the official source
- Unpack all .tar.gz archives into the same folder
- Adjust the folder structure and then run the
create_patches.py
from https://github.com/eiriniar/gleason_CNN/tree/master
This should result in the folloing folder structure:
arvaniti_gleason_patches
├── test_patches_750
│ ├── patho_1
│ │ ├── ZT80_38_A_1_1
│ │ ├── ZT76_39_A_1_1_patch_12_class_0.jpg
│ │ ├── ZT76_39_A_1_1_patch_23_class_0.jpg
│ │ │ └── ...
│ │ ├── ZT80_38_A_1_2
│ │ │ └── ...
│ │ └── ...
│ ├── patho_2 # we don't use this
│ │ └── ...
├── train_validation_patches_750
│ ├── ZT76_39_A_1_1
│ │ ├── ZT76_39_A_1_1_patch_12_class_0.jpg
│ │ ├── ZT76_39_A_1_1_patch_23_class_0.jpg
│ │ └── ...
│ ├── ZT76_39_A_1_2
│ └── ...