Datasets
VisionDataset
eva.vision.data.datasets.VisionDataset
Bases: MapDataset[Tuple[InputType, TargetType, Dict[str, Any]]]
, ABC
, Generic[InputType, TargetType]
Base dataset class for vision tasks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transforms |
Callable | None
|
A function/transform which returns a transformed version of the raw data samples. |
None
|
Source code in src/eva/vision/data/datasets/vision.py
classes: List[str] | None
property
Returns the list with names of the dataset names.
class_to_idx: Dict[str, int] | None
property
Returns a mapping of the class name to its target index.
load_metadata
Returns the dataset metadata.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
int
|
The index of the data sample to return the metadata of. |
required |
Returns:
Type | Description |
---|---|
Dict[str, Any] | None
|
The sample metadata. |
load_data
abstractmethod
Returns the index
'th data sample.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
int
|
The index of the data sample to load. |
required |
Returns:
Type | Description |
---|---|
InputType
|
The sample data. |
load_target
abstractmethod
Returns the index
'th target sample.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
int
|
The index of the data sample to load. |
required |
Returns:
Type | Description |
---|---|
TargetType
|
The sample target. |
filename
abstractmethod
Returns the filename of the index
'th data sample.
Note that this is the relative file path to the root.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
int
|
The index of the data-sample to select. |
required |
Returns:
Type | Description |
---|---|
str
|
The filename of the |
Source code in src/eva/vision/data/datasets/vision.py
Classification datasets
eva.vision.data.datasets.BACH
Bases: VisionDataset[Image, Tensor]
Dataset class for BACH images and corresponding targets.
The dataset is split into train and validation by taking into account the patient IDs to avoid any data leakage.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist. |
required |
split |
Literal['train', 'val'] | None
|
Dataset split to use. If |
None
|
download |
bool
|
Whether to download the data for the specified split.
Note that the download will be executed only by additionally
calling the :meth: |
False
|
transforms |
Callable | None
|
A function/transform which returns a transformed version of the raw data samples. |
None
|
Source code in src/eva/vision/data/datasets/classification/bach.py
eva.vision.data.datasets.BRACS
Bases: VisionDataset[Image, Tensor]
Dataset class for BRACS images and corresponding targets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Path to the root directory of the dataset. |
required |
split |
Literal['train', 'val', 'test']
|
Dataset split to use. |
required |
transforms |
Callable | None
|
A function/transform which returns a transformed version of the raw data samples. |
None
|
Source code in src/eva/vision/data/datasets/classification/bracs.py
eva.vision.data.datasets.BreaKHis
Bases: VisionDataset[Image, Tensor]
Dataset class for BreaKHis images and corresponding targets.
The dataset is split into train and validation by taking into account the patient IDs to avoid any data leakage.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist. |
required |
split |
Literal['train', 'val'] | None
|
Dataset split to use. If |
None
|
magnifications |
List[Literal['40X', '100X', '200X', '400X']] | None
|
A list of the WSI magnifications to select. By default only 40X images are used. |
None
|
download |
bool
|
Whether to download the data for the specified split.
Note that the download will be executed only by additionally
calling the :meth: |
False
|
transforms |
Callable | None
|
A function/transform which returns a transformed version of the raw data samples. |
None
|
Source code in src/eva/vision/data/datasets/classification/breakhis.py
eva.vision.data.datasets.Camelyon16
Bases: MultiWsiDataset
, VisionDataset[Image, Tensor]
Dataset class for Camelyon16 images and corresponding targets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Root directory of the dataset. |
required |
sampler |
Sampler
|
The sampler to use for sampling patch coordinates. |
required |
split |
Literal['train', 'val', 'test'] | None
|
Dataset split to use. If |
None
|
width |
int
|
Width of the patches to be extracted, in pixels. |
224
|
height |
int
|
Height of the patches to be extracted, in pixels. |
224
|
target_mpp |
float
|
Target microns per pixel (mpp) for the patches. |
0.5
|
backend |
str
|
The backend to use for reading the whole-slide images. |
'openslide'
|
image_transforms |
Callable | None
|
Transforms to apply to the extracted image patches. |
None
|
coords_path |
str | None
|
File path to save the patch coordinates as .csv. |
None
|
seed |
int
|
Random seed for reproducibility. |
42
|
Source code in src/eva/vision/data/datasets/classification/camelyon16.py
annotations_test_set: Dict[str, str]
cached
property
Loads the dataset labels.
annotations: Dict[str, str]
cached
property
Loads the dataset labels.
eva.vision.data.datasets.CRC
Bases: VisionDataset[Image, Tensor]
Dataset class for CRC images and corresponding targets.
The dataset is split into a train (train) and validation (val) set: - train: A set of 100,000 non-overlapping image patches from hematoxylin & eosin (H&E) stained histological images of human colorectal cancer (CRC) and normal tissue. - val: A set of 7180 image patches from N=50 patients with colorectal adenocarcinoma (no overlap with patients in NCT-CRC-HE-100K).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Path to the root directory of the dataset. |
required |
split |
Literal['train', 'val']
|
Dataset split to use. |
required |
download |
bool
|
Whether to download the data for the specified split.
Note that the download will be executed only by additionally
calling the :meth: |
False
|
transforms |
Callable | None
|
A function/transform which returns a transformed version of the raw data samples. |
None
|
Source code in src/eva/vision/data/datasets/classification/crc.py
eva.vision.data.datasets.GleasonArvaniti
Bases: VisionDataset[Image, Tensor]
Dataset class for GleasonArvaniti images and corresponding targets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Path to the root directory of the dataset. |
required |
split |
Literal['train', 'val', 'test'] | None
|
Dataset split to use. If |
None
|
transforms |
Callable | None
|
A function/transform which returns a transformed version of the raw data samples. |
None
|
Source code in src/eva/vision/data/datasets/classification/gleason_arvaniti.py
eva.vision.data.datasets.MHIST
Bases: VisionDataset[Image, Tensor]
MHIST dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Path to the root directory of the dataset. |
required |
split |
Literal['train', 'test']
|
Dataset split to use. |
required |
transforms |
Callable | None
|
A function/transform which returns a transformed version of the raw data samples. |
None
|
Source code in src/eva/vision/data/datasets/classification/mhist.py
eva.vision.data.datasets.PANDA
Bases: MultiWsiDataset
, VisionDataset[Image, Tensor]
Dataset class for PANDA images and corresponding targets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Root directory of the dataset. |
required |
sampler |
Sampler
|
The sampler to use for sampling patch coordinates. |
required |
split |
Literal['train', 'val', 'test'] | None
|
Dataset split to use. If |
None
|
width |
int
|
Width of the patches to be extracted, in pixels. |
224
|
height |
int
|
Height of the patches to be extracted, in pixels. |
224
|
target_mpp |
float
|
Target microns per pixel (mpp) for the patches. |
0.5
|
backend |
str
|
The backend to use for reading the whole-slide images. |
'openslide'
|
image_transforms |
Callable | None
|
Transforms to apply to the extracted image patches. |
None
|
coords_path |
str | None
|
File path to save the patch coordinates as .csv. |
None
|
seed |
int
|
Random seed for reproducibility. |
42
|
Source code in src/eva/vision/data/datasets/classification/panda.py
annotations: pd.DataFrame
cached
property
Loads the dataset labels.
eva.vision.data.datasets.PANDASmall
Bases: PANDA
Small version of the PANDA dataset for quicker benchmarking.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Root directory of the dataset. |
required |
sampler |
Sampler
|
The sampler to use for sampling patch coordinates. |
required |
split |
Literal['train', 'val', 'test'] | None
|
Dataset split to use. If |
None
|
width |
int
|
Width of the patches to be extracted, in pixels. |
224
|
height |
int
|
Height of the patches to be extracted, in pixels. |
224
|
target_mpp |
float
|
Target microns per pixel (mpp) for the patches. |
0.5
|
backend |
str
|
The backend to use for reading the whole-slide images. |
'openslide'
|
image_transforms |
Callable | None
|
Transforms to apply to the extracted image patches. |
None
|
coords_path |
str | None
|
File path to save the patch coordinates as .csv. |
None
|
seed |
int
|
Random seed for reproducibility. |
42
|
Source code in src/eva/vision/data/datasets/classification/panda.py
eva.vision.data.datasets.PatchCamelyon
Bases: VisionDataset[Image, Tensor]
Dataset class for PatchCamelyon images and corresponding targets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
The path to the dataset root. This path should contain the uncompressed h5 files and the metadata. |
required |
split |
Literal['train', 'val', 'test']
|
The dataset split for training, validation, or testing. |
required |
download |
bool
|
Whether to download the data for the specified split.
Note that the download will be executed only by additionally
calling the :meth: |
False
|
transforms |
Callable | None
|
A function/transform which returns a transformed version of the raw data samples. |
None
|
Source code in src/eva/vision/data/datasets/classification/patch_camelyon.py
eva.vision.data.datasets.UniToPatho
Bases: VisionDataset[Image, Tensor]
Dataset class for UniToPatho images and corresponding targets.
The dataset is split into train and validation by taking into account the patient IDs to avoid any data leakage.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Path to the root directory of the dataset. |
required |
split |
Literal['train', 'val'] | None
|
Dataset split to use. If |
None
|
transforms |
Callable | None
|
A function/transform which returns a transformed version of the raw data samples. |
None
|
Source code in src/eva/vision/data/datasets/classification/unitopatho.py
eva.vision.data.datasets.WsiClassificationDataset
Bases: MultiWsiDataset
, VisionDataset[Image, Tensor]
A general dataset class for whole-slide image classification using manifest files.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Root directory of the dataset. |
required |
manifest_file |
str
|
The path to the manifest file, relative to
the |
required |
width |
int
|
Width of the patches to be extracted, in pixels. |
required |
height |
int
|
Height of the patches to be extracted, in pixels. |
required |
target_mpp |
float
|
Target microns per pixel (mpp) for the patches. |
required |
sampler |
Sampler
|
The sampler to use for sampling patch coordinates. |
required |
backend |
str
|
The backend to use for reading the whole-slide images. |
'openslide'
|
split |
Literal['train', 'val', 'test'] | None
|
The split of the dataset to load. |
None
|
image_transforms |
Callable | None
|
Transforms to apply to the extracted image patches. |
None
|
column_mapping |
Dict[str, str]
|
Mapping of the columns in the manifest file. |
default_column_mapping
|
coords_path |
str | None
|
File path to save the patch coordinates as .csv. |
None
|
Source code in src/eva/vision/data/datasets/classification/wsi.py
Segmentation datasets
eva.vision.data.datasets.BCSS
Bases: MultiWsiDataset
, VisionDataset[Image, Mask]
Dataset class for BCSS semantic segmentation task.
Source: https://github.com/PathologyDataScience/BCSS
We apply the the class grouping proposed by the challenge baseline: https://bcsegmentation.grand-challenge.org/Baseline/
outside_roi: outside_roi tumor: angioinvasion, dcis stroma: stroma inflammatory: lymphocytic_infiltrate, plasma_cells, other_immune_infiltrate necrosis: necrosis_or_debris other: remaining
Be aware that outside_roi should be assigned zero-weight during model training.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Root directory of the dataset. |
required |
sampler |
Sampler
|
The sampler to use for sampling patch coordinates.
If |
required |
split |
Literal['train', 'val', 'trainval', 'test'] | None
|
Dataset split to use. If |
None
|
width |
int
|
Width of the patches to be extracted, in pixels. |
224
|
height |
int
|
Height of the patches to be extracted, in pixels. |
224
|
target_mpp |
float
|
Target microns per pixel (mpp) for the patches. |
0.5
|
transforms |
Callable | None
|
Transforms to apply to the extracted image & mask patches. |
None
|
Source code in src/eva/vision/data/datasets/segmentation/bcss.py
eva.vision.data.datasets.BTCV
Bases: VisionDataset[Volume, Mask]
Beyond the Cranial Vault (BTCV) Abdomen dataset.
The BTCV dataset comprises abdominal CT acquired at the Vanderbilt University Medical Center from metastatic liver cancer patients or post-operative ventral hernia patients. The dataset contains one background class and thirteen organ classes.
More info
- Multi-organ Abdominal CT Reference Standard Segmentations https://zenodo.org/records/1169361
- Dataset Split https://github.com/Luffy03/Large-Scale-Medical/blob/main/Downstream/monai/BTCV/dataset/dataset_0.json
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Path to the dataset root directory. |
required |
split |
Literal['train', 'val'] | None
|
Dataset split to use ('train' or 'val'). If None, it uses the full dataset. |
None
|
download |
bool
|
Whether to download the dataset. |
False
|
transforms |
Callable | None
|
A callable object for applying data transformations. If None, no transformations are applied. |
None
|
Source code in src/eva/vision/data/datasets/segmentation/btcv.py
load_data
Loads the CT volume for a given sample.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
int
|
The index of the desired sample. |
required |
Returns:
Type | Description |
---|---|
Volume
|
Tensor representing the CT volume of shape |
Source code in src/eva/vision/data/datasets/segmentation/btcv.py
load_target
Loads the segmentation mask for a given sample.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
int
|
The index of the desired sample. |
required |
Returns:
Type | Description |
---|---|
Mask
|
Tensor representing the segmentation mask of shape |
Source code in src/eva/vision/data/datasets/segmentation/btcv.py
eva.vision.data.datasets.CoNSeP
Bases: MultiWsiDataset
, VisionDataset[Image, Mask]
Dataset class for CoNSeP semantic segmentation task.
As in [1], we combine classes 3 (healthy epithelial) & 4 (dysplastic/malignant epithelial) into the epithelial class and 5 (fibroblast), 6 (muscle) & 7 (endothelial) into the spindle-shaped class.
[1] Graham, Simon, et al. "Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images." https://arxiv.org/abs/1802.04712
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Root directory of the dataset. |
required |
sampler |
Sampler | None
|
The sampler to use for sampling patch coordinates.
If |
None
|
split |
Literal['train', 'val'] | None
|
Dataset split to use. If |
None
|
width |
int
|
Width of the patches to be extracted, in pixels. |
250
|
height |
int
|
Height of the patches to be extracted, in pixels. |
250
|
target_mpp |
float
|
Target microns per pixel (mpp) for the patches. |
0.25
|
transforms |
Callable | None
|
Transforms to apply to the extracted image & mask patches. |
None
|
Source code in src/eva/vision/data/datasets/segmentation/consep.py
eva.vision.data.datasets.LiTS17
Bases: VisionDataset[Volume, Mask]
LiTS17 - Liver Tumor Segmentation Challenge 2017.
More info
- The Liver Tumor Segmentation Benchmark (LiTS) https://arxiv.org/pdf/1901.04056
- Dataset Split https://github.com/Luffy03/Large-Scale-Medical/blob/main/Downstream/monai/LiTs/dataset_lits.json
- Data needs to be manually downloaded from: https://drive.google.com/drive/folders/0B0vscETPGI1-Q1h1WFdEM2FHSUE
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Path to the dataset root directory. |
required |
split |
Literal['train', 'val'] | None
|
Dataset split to use ('train' or 'val'). If None, it uses the full dataset. |
None
|
transforms |
Callable | None
|
A callable object for applying data transformations. If None, no transformations are applied. |
None
|
Source code in src/eva/vision/data/datasets/segmentation/lits17.py
load_data
Loads the CT volume for a given sample.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
int
|
The index of the desired sample. |
required |
Returns:
Type | Description |
---|---|
Volume
|
Tensor representing the CT volume of shape |
Source code in src/eva/vision/data/datasets/segmentation/lits17.py
load_target
Loads the segmentation mask for a given sample.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
int
|
The index of the desired sample. |
required |
Returns:
Type | Description |
---|---|
Mask
|
Tensor representing the segmentation mask of shape |
Source code in src/eva/vision/data/datasets/segmentation/lits17.py
eva.vision.data.datasets.MoNuSAC
Bases: VisionDataset[Image, Mask]
MoNuSAC2020: A Multi-organ Nuclei Segmentation and Classification Challenge.
Webpage: https://monusac-2020.grand-challenge.org/
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist. |
required |
split |
Literal['train', 'test']
|
Dataset split to use. |
required |
export_masks |
bool
|
Whether to export, save and use the semantic label masks from disk. |
True
|
download |
bool
|
Whether to download the data for the specified split.
Note that the download will be executed only by additionally
calling the :meth: |
False
|
transforms |
Callable | None
|
A function/transforms that takes in an image and a target mask and returns the transformed versions of both. |
None
|
Source code in src/eva/vision/data/datasets/segmentation/monusac.py
eva.vision.data.datasets.EmbeddingsSegmentationDataset
Bases: EmbeddingsDataset[Mask]
Embeddings segmentation dataset.
Expects a manifest file listing the paths of .pt files that contain tensor embeddings of shape [embedding_dim] or [1, embedding_dim].
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Root directory of the dataset. |
required |
manifest_file |
str
|
The path to the manifest file, which is relative to
the |
required |
split |
Literal['train', 'val', 'test'] | None
|
The dataset split to use. The |
None
|
column_mapping |
Dict[str, str]
|
Defines the map between the variables and the manifest
columns. It will overwrite the |
default_column_mapping
|
embeddings_transforms |
Callable | None
|
A function/transform that transforms the embedding. |
None
|
target_transforms |
Callable | None
|
A function/transform that transforms the target. |
None
|