BACH

The BACH dataset consists of microscopy and WSI images, of which we use only the microscopy images. These are 408 labeled images from 4 classes ("Normal", "Benign", "Invasive", "InSitu"). This dataset was used for the "BACH Grand Challenge on Breast Cancer Histology images".

Raw data

Key stats


Modality	Vision (microscopy images)
Task	Multiclass classification (4 classes)
Cancer type	Breast
Data size	total: 10.4GB / data in use: 7.37 GB (18.9 MB per image)
Image dimension	1536 x 2048 x 3
Magnification (μm/px)	20x (0.42)
Files format	`.tif` images
Number of images	408 (102 from each class)
Splits in use	one labeled split

Organization

The data ICIAR2018_BACH_Challenge.zip from zenodo is organized as follows:

ICAR2018_BACH_Challenge
├── Photos                    # All labeled patches used by eva
│   ├── Normal
│   │   ├── n032.tif
│   │   └── ...
│   ├── Benign
│   │   └── ...
│   ├── Invasive
│   │   └── ...
│   ├── InSitu
│   │   └── ...
├── WSI                       # WSIs, not in use
│   ├── ...
└── ...

Download and preprocessing

The BACH dataset class supports downloading the data during runtime by setting the init argument download=True.

[!NOTE] In the provided BACH-config files the download argument is set to false. To enable automatic download you will need to open the config and set download: true.

The splits are created from the indices specified in the BACH dataset class. These indices were picked to prevent data leakage due to images belonging to the same patient. Because the small dataset in combination with the patient ID constraint does not allow to split the data three-ways with sufficient amount of data in each split, we only create a train and val split and leave it to the user to submit predictions on the official test split to the BACH Challenge Leaderboard.

Splits	Train	Validation
#Samples	268 (67%)	132 (33%)

Relevant links

BACH dataset on zenodo
BACH Challenge website
BACH Challenge Leaderboard
Patient ID information (Link provided on BACH challenge website)
Reference API Vision dataset classes

License

Attribution-NonCommercial-ShareAlike 4.0 International