BACH
The BACH dataset consists of microscopy and WSI images, of which we use only the microscopy images. These are 408 labeled images from 4 classes ("Normal", "Benign", "Invasive", "InSitu"). This dataset was used for the "BACH Grand Challenge on Breast Cancer Histology images".
Raw data
Key stats
Modality | Vision (microscopy images) |
Task | Multiclass classification (4 classes) |
Cancer type | Breast |
Data size | total: 10.4GB / data in use: 7.37 GB (18.9 MB per image) |
Image dimension | 1536 x 2048 x 3 |
Magnification (μm/px) | 20x (0.42) |
Files format | .tif images |
Number of images | 408 (102 from each class) |
Splits in use | one labeled split |
Organization
The data ICIAR2018_BACH_Challenge.zip
from zenodo is organized as follows:
ICAR2018_BACH_Challenge
├── Photos # All labeled patches used by eva
│ ├── Normal
│ │ ├── n032.tif
│ │ └── ...
│ ├── Benign
│ │ └── ...
│ ├── Invasive
│ │ └── ...
│ ├── InSitu
│ │ └── ...
├── WSI # WSIs, not in use
│ ├── ...
└── ...
Download and preprocessing
The BACH
dataset class supports downloading the data during runtime by setting the init argument download=True
.
[!NOTE] In the provided
BACH
-config files the download argument is set tofalse
. To enable automatic download you will need to open the config and setdownload: true
.
The splits are created from the indices specified in the BACH dataset class. These indices were picked to prevent data leakage due to images belonging to the same patient. Because the small dataset in combination with the patient ID constraint does not allow to split the data three-ways with sufficient amount of data in each split, we only create a train and val split and leave it to the user to submit predictions on the official test split to the BACH Challenge Leaderboard.
Splits | Train | Validation |
---|---|---|
#Samples | 268 (67%) | 132 (33%) |
Relevant links
- BACH dataset on zenodo
- BACH Challenge website
- BACH Challenge Leaderboard
- Patient ID information (Link provided on BACH challenge website)
- Reference API Vision dataset classes