MSD Task 7 Pancreas

MSD Task 7 Pancreas is part of the Medical Segmentation Decathlon (MSD) challenge. The dataset consists of 420 portal-venous phase CT scans of patients undergoing resection of pancreatic masses. The corresponding target ROIs were the pancreatic parenchyma and pancreatic mass (cyst or tumor). This dataset was selected due to label unbalance between large (background), medium (pancreas) and small (tumor) structures. The data was acquired in the Memorial Sloan Kettering Cancer Center, New York, US.

The segmentation classes are: Background, Pancreas and Cancer.

Raw data

Key stats


Modality	Vision (radiology, CT scans)
Task	Segmentation (3 classes)
Data size	11 GB
Image dimension	Variable (3D volumes)
Files format	`.nii.gz` ("NIFTI") images
Number of scans	281
Splits in use	train / val

Splits

The dataset uses predefined train/validation splits:

Splits	Train	Validation
# Scans	257	24

The split was taken from https://github.com/Luffy03/Large-Scale-Medical/blob/main/Downstream/monai/Panc/dataset_panc.json

Organization

The training data is expected to be organized as follows:

Dataset007_Pancreas
├── imagesTr/
│   ├── pancreas_001_0000.nii.gz
│   ├── pancreas_002_0000.nii.gz
│   └── ...
└── labelsTr/
    ├── pancreas_001.nii.gz
    ├── pancreas_002.nii.gz
    └── ...

Download and preprocessing

The MSDTask7Pancreas dataset can be downloaded automatically by setting download=True when initializing the dataset, or by setting the environment variable DOWNLOAD_DATA=true. The dataset is hosted on Hugging Face and requires a Hugging Face token to be set in the HF_TOKEN environment variable.

Relevant links

License

Please refer to the original dataset license terms from the Medical Segmentation Decathlon.