Datasets
VisionDataset
eva.vision.data.datasets.VisionDataset
Bases: MapDataset[Tuple[InputType, TargetType, Dict[str, Any]]]
, ABC
, Generic[InputType, TargetType]
Base dataset class for vision tasks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transforms |
Callable | None
|
A function/transform which returns a transformed version of the raw data samples. |
None
|
Source code in src/eva/vision/data/datasets/vision.py
classes: List[str] | None
property
Returns the list with names of the dataset names.
class_to_idx: Dict[str, int] | None
property
Returns a mapping of the class name to its target index.
load_metadata
Returns the dataset metadata.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
int
|
The index of the data sample to return the metadata of. |
required |
Returns:
Type | Description |
---|---|
Dict[str, Any] | None
|
The sample metadata. |
load_data
abstractmethod
Returns the index
'th data sample.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
int
|
The index of the data sample to load. |
required |
Returns:
Type | Description |
---|---|
InputType
|
The sample data. |
load_target
abstractmethod
Returns the index
'th target sample.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
int
|
The index of the data sample to load. |
required |
Returns:
Type | Description |
---|---|
TargetType
|
The sample target. |
filename
abstractmethod
Returns the filename of the index
'th data sample.
Note that this is the relative file path to the root.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
int
|
The index of the data-sample to select. |
required |
Returns:
Type | Description |
---|---|
str
|
The filename of the |
Source code in src/eva/vision/data/datasets/vision.py
Classification datasets
eva.vision.data.datasets.BACH
Bases: VisionDataset[Image, Tensor]
Dataset class for BACH images and corresponding targets.
The dataset is split into train and validation by taking into account the patient IDs to avoid any data leakage.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist. |
required |
split |
Literal['train', 'val'] | None
|
Dataset split to use. If |
None
|
download |
bool
|
Whether to download the data for the specified split.
Note that the download will be executed only by additionally
calling the :meth: |
False
|
transforms |
Callable | None
|
A function/transform which returns a transformed version of the raw data samples. |
None
|
Source code in src/eva/vision/data/datasets/classification/bach.py
eva.vision.data.datasets.BRACS
Bases: VisionDataset[Image, Tensor]
Dataset class for BRACS images and corresponding targets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Path to the root directory of the dataset. |
required |
split |
Literal['train', 'val', 'test']
|
Dataset split to use. |
required |
transforms |
Callable | None
|
A function/transform which returns a transformed version of the raw data samples. |
None
|
Source code in src/eva/vision/data/datasets/classification/bracs.py
eva.vision.data.datasets.BreaKHis
Bases: VisionDataset[Image, Tensor]
Dataset class for BreaKHis images and corresponding targets.
The dataset is split into train and validation by taking into account the patient IDs to avoid any data leakage.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist. |
required |
split |
Literal['train', 'val'] | None
|
Dataset split to use. If |
None
|
magnifications |
List[Literal['40X', '100X', '200X', '400X']] | None
|
A list of the WSI magnifications to select. By default only 40X images are used. |
None
|
download |
bool
|
Whether to download the data for the specified split.
Note that the download will be executed only by additionally
calling the :meth: |
False
|
transforms |
Callable | None
|
A function/transform which returns a transformed version of the raw data samples. |
None
|
Source code in src/eva/vision/data/datasets/classification/breakhis.py
eva.vision.data.datasets.Camelyon16
Bases: MultiWsiDataset
, VisionDataset[Image, Tensor]
Dataset class for Camelyon16 images and corresponding targets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Root directory of the dataset. |
required |
sampler |
Sampler
|
The sampler to use for sampling patch coordinates. |
required |
split |
Literal['train', 'val', 'test'] | None
|
Dataset split to use. If |
None
|
width |
int
|
Width of the patches to be extracted, in pixels. |
224
|
height |
int
|
Height of the patches to be extracted, in pixels. |
224
|
target_mpp |
float
|
Target microns per pixel (mpp) for the patches. |
0.5
|
backend |
str
|
The backend to use for reading the whole-slide images. |
'openslide'
|
image_transforms |
Callable | None
|
Transforms to apply to the extracted image patches. |
None
|
coords_path |
str | None
|
File path to save the patch coordinates as .csv. |
None
|
seed |
int
|
Random seed for reproducibility. |
42
|
Source code in src/eva/vision/data/datasets/classification/camelyon16.py
annotations_test_set: Dict[str, str]
cached
property
Loads the dataset labels.
annotations: Dict[str, str]
cached
property
Loads the dataset labels.
eva.vision.data.datasets.CRC
Bases: VisionDataset[Image, Tensor]
Dataset class for CRC images and corresponding targets.
The dataset is split into a train (train) and validation (val) set: - train: A set of 100,000 non-overlapping image patches from hematoxylin & eosin (H&E) stained histological images of human colorectal cancer (CRC) and normal tissue. - val: A set of 7180 image patches from N=50 patients with colorectal adenocarcinoma (no overlap with patients in NCT-CRC-HE-100K).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Path to the root directory of the dataset. |
required |
split |
Literal['train', 'val']
|
Dataset split to use. |
required |
download |
bool
|
Whether to download the data for the specified split.
Note that the download will be executed only by additionally
calling the :meth: |
False
|
transforms |
Callable | None
|
A function/transform which returns a transformed version of the raw data samples. |
None
|
Source code in src/eva/vision/data/datasets/classification/crc.py
eva.vision.data.datasets.GleasonArvaniti
Bases: VisionDataset[Image, Tensor]
Dataset class for GleasonArvaniti images and corresponding targets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Path to the root directory of the dataset. |
required |
split |
Literal['train', 'val', 'test'] | None
|
Dataset split to use. If |
None
|
transforms |
Callable | None
|
A function/transform which returns a transformed version of the raw data samples. |
None
|
Source code in src/eva/vision/data/datasets/classification/gleason_arvaniti.py
eva.vision.data.datasets.MHIST
Bases: VisionDataset[Image, Tensor]
MHIST dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Path to the root directory of the dataset. |
required |
split |
Literal['train', 'test']
|
Dataset split to use. |
required |
transforms |
Callable | None
|
A function/transform which returns a transformed version of the raw data samples. |
None
|
Source code in src/eva/vision/data/datasets/classification/mhist.py
eva.vision.data.datasets.PANDA
Bases: MultiWsiDataset
, VisionDataset[Image, Tensor]
Dataset class for PANDA images and corresponding targets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Root directory of the dataset. |
required |
sampler |
Sampler
|
The sampler to use for sampling patch coordinates. |
required |
split |
Literal['train', 'val', 'test'] | None
|
Dataset split to use. If |
None
|
width |
int
|
Width of the patches to be extracted, in pixels. |
224
|
height |
int
|
Height of the patches to be extracted, in pixels. |
224
|
target_mpp |
float
|
Target microns per pixel (mpp) for the patches. |
0.5
|
backend |
str
|
The backend to use for reading the whole-slide images. |
'openslide'
|
image_transforms |
Callable | None
|
Transforms to apply to the extracted image patches. |
None
|
coords_path |
str | None
|
File path to save the patch coordinates as .csv. |
None
|
seed |
int
|
Random seed for reproducibility. |
42
|
Source code in src/eva/vision/data/datasets/classification/panda.py
annotations: pd.DataFrame
cached
property
Loads the dataset labels.
eva.vision.data.datasets.PANDASmall
Bases: PANDA
Small version of the PANDA dataset for quicker benchmarking.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Root directory of the dataset. |
required |
sampler |
Sampler
|
The sampler to use for sampling patch coordinates. |
required |
split |
Literal['train', 'val', 'test'] | None
|
Dataset split to use. If |
None
|
width |
int
|
Width of the patches to be extracted, in pixels. |
224
|
height |
int
|
Height of the patches to be extracted, in pixels. |
224
|
target_mpp |
float
|
Target microns per pixel (mpp) for the patches. |
0.5
|
backend |
str
|
The backend to use for reading the whole-slide images. |
'openslide'
|
image_transforms |
Callable | None
|
Transforms to apply to the extracted image patches. |
None
|
coords_path |
str | None
|
File path to save the patch coordinates as .csv. |
None
|
seed |
int
|
Random seed for reproducibility. |
42
|
Source code in src/eva/vision/data/datasets/classification/panda.py
eva.vision.data.datasets.PatchCamelyon
Bases: VisionDataset[Image, Tensor]
Dataset class for PatchCamelyon images and corresponding targets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
The path to the dataset root. This path should contain the uncompressed h5 files and the metadata. |
required |
split |
Literal['train', 'val', 'test']
|
The dataset split for training, validation, or testing. |
required |
download |
bool
|
Whether to download the data for the specified split.
Note that the download will be executed only by additionally
calling the :meth: |
False
|
transforms |
Callable | None
|
A function/transform which returns a transformed version of the raw data samples. |
None
|
Source code in src/eva/vision/data/datasets/classification/patch_camelyon.py
eva.vision.data.datasets.UniToPatho
Bases: VisionDataset[Image, Tensor]
Dataset class for UniToPatho images and corresponding targets.
The dataset is split into train and validation by taking into account the patient IDs to avoid any data leakage.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Path to the root directory of the dataset. |
required |
split |
Literal['train', 'val'] | None
|
Dataset split to use. If |
None
|
transforms |
Callable | None
|
A function/transform which returns a transformed version of the raw data samples. |
None
|
Source code in src/eva/vision/data/datasets/classification/unitopatho.py
eva.vision.data.datasets.WsiClassificationDataset
Bases: MultiWsiDataset
, VisionDataset[Image, Tensor]
A general dataset class for whole-slide image classification using manifest files.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Root directory of the dataset. |
required |
manifest_file |
str
|
The path to the manifest file, relative to
the |
required |
width |
int
|
Width of the patches to be extracted, in pixels. |
required |
height |
int
|
Height of the patches to be extracted, in pixels. |
required |
target_mpp |
float
|
Target microns per pixel (mpp) for the patches. |
required |
sampler |
Sampler
|
The sampler to use for sampling patch coordinates. |
required |
backend |
str
|
The backend to use for reading the whole-slide images. |
'openslide'
|
split |
Literal['train', 'val', 'test'] | None
|
The split of the dataset to load. |
None
|
image_transforms |
Callable | None
|
Transforms to apply to the extracted image patches. |
None
|
column_mapping |
Dict[str, str]
|
Mapping of the columns in the manifest file. |
default_column_mapping
|
coords_path |
str | None
|
File path to save the patch coordinates as .csv. |
None
|
Source code in src/eva/vision/data/datasets/classification/wsi.py
Segmentation datasets
eva.vision.data.datasets.BCSS
Bases: MultiWsiDataset
, VisionDataset[Image, Mask]
Dataset class for BCSS semantic segmentation task.
Source: https://github.com/PathologyDataScience/BCSS
We apply the the class grouping proposed by the challenge baseline: https://bcsegmentation.grand-challenge.org/Baseline/
outside_roi: outside_roi tumor: angioinvasion, dcis stroma: stroma inflammatory: lymphocytic_infiltrate, plasma_cells, other_immune_infiltrate necrosis: necrosis_or_debris other: remaining
Be aware that outside_roi should be assigned zero-weight during model training.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Root directory of the dataset. |
required |
sampler |
Sampler
|
The sampler to use for sampling patch coordinates.
If |
required |
split |
Literal['train', 'val', 'trainval', 'test'] | None
|
Dataset split to use. If |
None
|
width |
int
|
Width of the patches to be extracted, in pixels. |
224
|
height |
int
|
Height of the patches to be extracted, in pixels. |
224
|
target_mpp |
float
|
Target microns per pixel (mpp) for the patches. |
0.5
|
transforms |
Callable | None
|
Transforms to apply to the extracted image & mask patches. |
None
|
Source code in src/eva/vision/data/datasets/segmentation/bcss.py
eva.vision.data.datasets.BTCV
Bases: VisionDataset[Volume, Mask]
Beyond the Cranial Vault (BTCV) Abdomen dataset.
The BTCV dataset comprises abdominal CT acquired at the Vanderbilt University Medical Center from metastatic liver cancer patients or post-operative ventral hernia patients. The dataset contains one background class and thirteen organ classes.
More info
- Multi-organ Abdominal CT Reference Standard Segmentations https://zenodo.org/records/1169361
- Dataset Split https://github.com/Luffy03/Large-Scale-Medical/blob/main/Downstream/monai/BTCV/dataset/dataset_0.json
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Path to the dataset root directory. |
required |
split |
Literal['train', 'val'] | None
|
Dataset split to use ('train' or 'val'). If None, it uses the full dataset. |
None
|
download |
bool
|
Whether to download the dataset. |
False
|
transforms |
Callable | None
|
A callable object for applying data transformations. If None, no transformations are applied. |
None
|
Source code in src/eva/vision/data/datasets/segmentation/btcv.py
load_data
Loads the CT volume for a given sample.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
int
|
The index of the desired sample. |
required |
Returns:
Type | Description |
---|---|
Volume
|
Tensor representing the CT volume of shape |
Source code in src/eva/vision/data/datasets/segmentation/btcv.py
load_target
Loads the segmentation mask for a given sample.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
int
|
The index of the desired sample. |
required |
Returns:
Type | Description |
---|---|
Mask
|
Tensor representing the segmentation mask of shape |
Source code in src/eva/vision/data/datasets/segmentation/btcv.py
eva.vision.data.datasets.CoNSeP
Bases: MultiWsiDataset
, VisionDataset[Image, Mask]
Dataset class for CoNSeP semantic segmentation task.
As in [1], we combine classes 3 (healthy epithelial) & 4 (dysplastic/malignant epithelial) into the epithelial class and 5 (fibroblast), 6 (muscle) & 7 (endothelial) into the spindle-shaped class.
[1] Graham, Simon, et al. "Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images." https://arxiv.org/abs/1802.04712
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Root directory of the dataset. |
required |
sampler |
Sampler | None
|
The sampler to use for sampling patch coordinates.
If |
None
|
split |
Literal['train', 'val'] | None
|
Dataset split to use. If |
None
|
width |
int
|
Width of the patches to be extracted, in pixels. |
250
|
height |
int
|
Height of the patches to be extracted, in pixels. |
250
|
target_mpp |
float
|
Target microns per pixel (mpp) for the patches. |
0.25
|
transforms |
Callable | None
|
Transforms to apply to the extracted image & mask patches. |
None
|
Source code in src/eva/vision/data/datasets/segmentation/consep.py
eva.vision.data.datasets.LiTS
Bases: VisionDataset[Image, Mask]
LiTS - Liver Tumor Segmentation Challenge.
Webpage: https://competitions.codalab.org/competitions/17094
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist. |
required |
split |
Literal['train', 'val', 'test'] | None
|
Dataset split to use. |
None
|
transforms |
Callable | None
|
A function/transforms that takes in an image and a target mask and returns the transformed versions of both. |
None
|
seed |
int
|
Seed used for generating the dataset splits. |
8
|
Source code in src/eva/vision/data/datasets/segmentation/lits.py
eva.vision.data.datasets.LiTSBalanced
Bases: LiTS
Balanced version of the LiTS - Liver Tumor Segmentation Challenge dataset.
For each volume in the dataset, we sample the same number of slices where only the liver and where both liver and tumor are present.
Webpage: https://competitions.codalab.org/competitions/17094
For the splits we follow: https://arxiv.org/pdf/2010.01663v2
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist. |
required |
split |
Literal['train', 'val', 'test'] | None
|
Dataset split to use. |
None
|
transforms |
Callable | None
|
A function/transforms that takes in an image and a target mask and returns the transformed versions of both. |
None
|
seed |
int
|
Seed used for generating the dataset splits and sampling of the slices. |
8
|
Source code in src/eva/vision/data/datasets/segmentation/lits_balanced.py
eva.vision.data.datasets.MoNuSAC
Bases: VisionDataset[Image, Mask]
MoNuSAC2020: A Multi-organ Nuclei Segmentation and Classification Challenge.
Webpage: https://monusac-2020.grand-challenge.org/
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist. |
required |
split |
Literal['train', 'test']
|
Dataset split to use. |
required |
export_masks |
bool
|
Whether to export, save and use the semantic label masks from disk. |
True
|
download |
bool
|
Whether to download the data for the specified split.
Note that the download will be executed only by additionally
calling the :meth: |
False
|
transforms |
Callable | None
|
A function/transforms that takes in an image and a target mask and returns the transformed versions of both. |
None
|
Source code in src/eva/vision/data/datasets/segmentation/monusac.py
eva.vision.data.datasets.TotalSegmentator2D
Bases: VisionDataset[Image, Mask]
TotalSegmentator 2D segmentation dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist. |
required |
split |
Literal['train', 'val', 'test'] | None
|
Dataset split to use. If |
required |
version |
Literal['small', 'full'] | None
|
The version of the dataset to initialize. If |
'full'
|
download |
bool
|
Whether to download the data for the specified split.
Note that the download will be executed only by additionally
calling the :meth: |
False
|
classes |
List[str] | None
|
Whether to configure the dataset with a subset of classes.
If |
None
|
class_mappings |
Dict[str, str] | None
|
A dictionary that maps the original class names to a
reduced set of classes. If |
reduced_class_mappings
|
optimize_mask_loading |
bool
|
Whether to pre-process the segmentation masks
in order to optimize the loading time. In the |
True
|
decompress |
bool
|
Whether to decompress the ct.nii.gz files when preparing the data. The label masks won't be decompressed, but when enabling optimize_mask_loading it will export the semantic label masks to a single file in uncompressed .nii format. |
True
|
num_workers |
int
|
The number of workers to use for optimizing the masks & decompressing the .gz files. |
10
|
transforms |
Callable | None
|
A function/transforms that takes in an image and a target mask and returns the transformed versions of both. |
None
|
Source code in src/eva/vision/data/datasets/segmentation/total_segmentator_2d.py
eva.vision.data.datasets.EmbeddingsSegmentationDataset
Bases: EmbeddingsDataset[Mask]
Embeddings segmentation dataset.
Expects a manifest file listing the paths of .pt files that contain tensor embeddings of shape [embedding_dim] or [1, embedding_dim].
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Root directory of the dataset. |
required |
manifest_file |
str
|
The path to the manifest file, which is relative to
the |
required |
split |
Literal['train', 'val', 'test'] | None
|
The dataset split to use. The |
None
|
column_mapping |
Dict[str, str]
|
Defines the map between the variables and the manifest
columns. It will overwrite the |
default_column_mapping
|
embeddings_transforms |
Callable | None
|
A function/transform that transforms the embedding. |
None
|
target_transforms |
Callable | None
|
A function/transform that transforms the target. |
None
|