BCSS
The BCSS (Breast Cancer Semantic Segmentation) consists of extracts from 151 WSI images from TCGA, containing over 20,000 segmentation annotations covering 21 different tissue types.
Raw data
Key stats
Modality | Vision (WSI extracts) |
Task | Segmentation - 22 classes (tissue types) |
Data size | total: ~5GB |
Image dimension | ~1000-3000 x ~1000-3000 x 3 |
Magnification (μm/px) | 40x (0.25) |
Files format | .png images / .mat segmentation masks |
Number of images | 151 |
Splits in use | Train, Val and Test |
Organization
The data is organized as follows:
bcss
├── rgbs_colorNormalized # wsi images
│ ├── TCGA-*.png
├── masks # segmentation masks
│ ├── TCGA-*.png # same filenames as images
Download and preprocessing
The BCSS
dataset class doesn't download the data during runtime and must be downloaded manually from links provided here.
Although the original images have a resolution of 0.25 microns per pixel (mpp), we extract patches at 0.5 mpp for evaluation. This is because using the original resolution with common foundation model patch sizes (e.g. 224x224 pixels) would result in regions that are too small, leading to less expressive segmentation masks and unnecessarily complicating the task.
Splits
As a test set, we use the images from the medical institues OL, LL, E2, EW, GM, and S3, as proposed by the authors. For the validation split, we use images from the institutes BH, C8, A8, A1 and E9, which results in the following dataset sizes:
Splits | Train | Validation | Test |
---|---|---|---|
#Samples | 76 (50.3%) | 30 (19.9%) | 45 (29.8%) |
Relevant links
- Dataset Repo
- Breast Cancer Segmentation Grand Challenge
- Google Drive Download Link for 0.25 mpp version
License
The BCSS dataset is held under the CC0 1.0 UNIVERSAL license.