Skip to content

Datasets

VisionDataset

eva.vision.data.datasets.VisionDataset

Bases: MapDataset[Tuple[InputType, TargetType, Dict[str, Any]]], ABC, Generic[InputType, TargetType]

Base dataset class for vision tasks.

Parameters:

Name Type Description Default
transforms Callable | None

A function/transform which returns a transformed version of the raw data samples.

None
Source code in src/eva/vision/data/datasets/vision.py
def __init__(
    self,
    transforms: Callable | None = None,
) -> None:
    """Initializes the dataset.

    Args:
        transforms: A function/transform which returns a transformed
            version of the raw data samples.
    """
    super().__init__()

    self._transforms = transforms

classes: List[str] | None property

Returns the list with names of the dataset names.

class_to_idx: Dict[str, int] | None property

Returns a mapping of the class name to its target index.

load_metadata

Returns the dataset metadata.

Parameters:

Name Type Description Default
index int

The index of the data sample to return the metadata of.

required

Returns:

Type Description
Dict[str, Any] | None

The sample metadata.

Source code in src/eva/vision/data/datasets/vision.py
def load_metadata(self, index: int) -> Dict[str, Any] | None:
    """Returns the dataset metadata.

    Args:
        index: The index of the data sample to return the metadata of.

    Returns:
        The sample metadata.
    """

load_data abstractmethod

Returns the index'th data sample.

Parameters:

Name Type Description Default
index int

The index of the data sample to load.

required

Returns:

Type Description
InputType

The sample data.

Source code in src/eva/vision/data/datasets/vision.py
@abc.abstractmethod
def load_data(self, index: int) -> InputType:
    """Returns the `index`'th data sample.

    Args:
        index: The index of the data sample to load.

    Returns:
        The sample data.
    """

load_target abstractmethod

Returns the index'th target sample.

Parameters:

Name Type Description Default
index int

The index of the data sample to load.

required

Returns:

Type Description
TargetType

The sample target.

Source code in src/eva/vision/data/datasets/vision.py
@abc.abstractmethod
def load_target(self, index: int) -> TargetType:
    """Returns the `index`'th target sample.

    Args:
        index: The index of the data sample to load.

    Returns:
        The sample target.
    """

filename abstractmethod

Returns the filename of the index'th data sample.

Note that this is the relative file path to the root.

Parameters:

Name Type Description Default
index int

The index of the data-sample to select.

required

Returns:

Type Description
str

The filename of the index'th data sample.

Source code in src/eva/vision/data/datasets/vision.py
@abc.abstractmethod
def filename(self, index: int) -> str:
    """Returns the filename of the `index`'th data sample.

    Note that this is the relative file path to the root.

    Args:
        index: The index of the data-sample to select.

    Returns:
        The filename of the `index`'th data sample.
    """

Classification datasets

eva.vision.data.datasets.BACH

Bases: VisionDataset[Image, Tensor]

Dataset class for BACH images and corresponding targets.

The dataset is split into train and validation by taking into account the patient IDs to avoid any data leakage.

Parameters:

Name Type Description Default
root str

Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist.

required
split Literal['train', 'val'] | None

Dataset split to use. If None, the entire dataset is used.

None
download bool

Whether to download the data for the specified split. Note that the download will be executed only by additionally calling the :meth:prepare_data method and if the data does not yet exist on disk.

False
transforms Callable | None

A function/transform which returns a transformed version of the raw data samples.

None
Source code in src/eva/vision/data/datasets/classification/bach.py
def __init__(
    self,
    root: str,
    split: Literal["train", "val"] | None = None,
    download: bool = False,
    transforms: Callable | None = None,
) -> None:
    """Initialize the dataset.

    The dataset is split into train and validation by taking into account
    the patient IDs to avoid any data leakage.

    Args:
        root: Path to the root directory of the dataset. The dataset will
            be downloaded and extracted here, if it does not already exist.
        split: Dataset split to use. If `None`, the entire dataset is used.
        download: Whether to download the data for the specified split.
            Note that the download will be executed only by additionally
            calling the :meth:`prepare_data` method and if the data does
            not yet exist on disk.
        transforms: A function/transform which returns a transformed
            version of the raw data samples.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split
    self._download = download

    self._samples: List[Tuple[str, int]] = []
    self._indices: List[int] = []

eva.vision.data.datasets.BRACS

Bases: VisionDataset[Image, Tensor]

Dataset class for BRACS images and corresponding targets.

Parameters:

Name Type Description Default
root str

Path to the root directory of the dataset.

required
split Literal['train', 'val', 'test']

Dataset split to use.

required
transforms Callable | None

A function/transform which returns a transformed version of the raw data samples.

None
Source code in src/eva/vision/data/datasets/classification/bracs.py
def __init__(
    self,
    root: str,
    split: Literal["train", "val", "test"],
    transforms: Callable | None = None,
) -> None:
    """Initializes the dataset.

    Args:
        root: Path to the root directory of the dataset.
        split: Dataset split to use.
        transforms: A function/transform which returns a transformed
            version of the raw data samples.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split

    self._samples: List[Tuple[str, int]] = []

eva.vision.data.datasets.BreaKHis

Bases: VisionDataset[Image, Tensor]

Dataset class for BreaKHis images and corresponding targets.

The dataset is split into train and validation by taking into account the patient IDs to avoid any data leakage.

Parameters:

Name Type Description Default
root str

Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist.

required
split Literal['train', 'val'] | None

Dataset split to use. If None, the entire dataset is used.

None
magnifications List[Literal['40X', '100X', '200X', '400X']] | None

A list of the WSI magnifications to select. By default only 40X images are used.

None
download bool

Whether to download the data for the specified split. Note that the download will be executed only by additionally calling the :meth:prepare_data method and if the data does not yet exist on disk.

False
transforms Callable | None

A function/transform which returns a transformed version of the raw data samples.

None
Source code in src/eva/vision/data/datasets/classification/breakhis.py
def __init__(
    self,
    root: str,
    split: Literal["train", "val"] | None = None,
    magnifications: List[Literal["40X", "100X", "200X", "400X"]] | None = None,
    download: bool = False,
    transforms: Callable | None = None,
) -> None:
    """Initialize the dataset.

    The dataset is split into train and validation by taking into account
    the patient IDs to avoid any data leakage.

    Args:
        root: Path to the root directory of the dataset. The dataset will
            be downloaded and extracted here, if it does not already exist.
        split: Dataset split to use. If `None`, the entire dataset is used.
        magnifications: A list of the WSI magnifications to select. By default
            only 40X images are used.
        download: Whether to download the data for the specified split.
            Note that the download will be executed only by additionally
            calling the :meth:`prepare_data` method and if the data does
            not yet exist on disk.
        transforms: A function/transform which returns a transformed
            version of the raw data samples.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split
    self._download = download

    self._magnifications = magnifications or self._default_magnifications
    self._indices: List[int] = []

eva.vision.data.datasets.Camelyon16

Bases: MultiWsiDataset, VisionDataset[Image, Tensor]

Dataset class for Camelyon16 images and corresponding targets.

Parameters:

Name Type Description Default
root str

Root directory of the dataset.

required
sampler Sampler

The sampler to use for sampling patch coordinates.

required
split Literal['train', 'val', 'test'] | None

Dataset split to use. If None, the entire dataset is used.

None
width int

Width of the patches to be extracted, in pixels.

224
height int

Height of the patches to be extracted, in pixels.

224
target_mpp float

Target microns per pixel (mpp) for the patches.

0.5
backend str

The backend to use for reading the whole-slide images.

'openslide'
image_transforms Callable | None

Transforms to apply to the extracted image patches.

None
coords_path str | None

File path to save the patch coordinates as .csv.

None
seed int

Random seed for reproducibility.

42
Source code in src/eva/vision/data/datasets/classification/camelyon16.py
def __init__(
    self,
    root: str,
    sampler: samplers.Sampler,
    split: Literal["train", "val", "test"] | None = None,
    width: int = 224,
    height: int = 224,
    target_mpp: float = 0.5,
    backend: str = "openslide",
    image_transforms: Callable | None = None,
    coords_path: str | None = None,
    seed: int = 42,
) -> None:
    """Initializes the dataset.

    Args:
        root: Root directory of the dataset.
        sampler: The sampler to use for sampling patch coordinates.
        split: Dataset split to use. If `None`, the entire dataset is used.
        width: Width of the patches to be extracted, in pixels.
        height: Height of the patches to be extracted, in pixels.
        target_mpp: Target microns per pixel (mpp) for the patches.
        backend: The backend to use for reading the whole-slide images.
        image_transforms: Transforms to apply to the extracted image patches.
        coords_path: File path to save the patch coordinates as .csv.
        seed: Random seed for reproducibility.
    """
    self._split = split
    self._root = root
    self._width = width
    self._height = height
    self._target_mpp = target_mpp
    self._seed = seed

    wsi.MultiWsiDataset.__init__(
        self,
        root=root,
        file_paths=self._load_file_paths(split),
        width=width,
        height=height,
        sampler=sampler,
        target_mpp=target_mpp,
        backend=backend,
        image_transforms=image_transforms,
        coords_path=coords_path,
    )

annotations_test_set: Dict[str, str] cached property

Loads the dataset labels.

annotations: Dict[str, str] cached property

Loads the dataset labels.

eva.vision.data.datasets.CRC

Bases: VisionDataset[Image, Tensor]

Dataset class for CRC images and corresponding targets.

The dataset is split into a train (train) and validation (val) set: - train: A set of 100,000 non-overlapping image patches from hematoxylin & eosin (H&E) stained histological images of human colorectal cancer (CRC) and normal tissue. - val: A set of 7180 image patches from N=50 patients with colorectal adenocarcinoma (no overlap with patients in NCT-CRC-HE-100K).

Parameters:

Name Type Description Default
root str

Path to the root directory of the dataset.

required
split Literal['train', 'val']

Dataset split to use.

required
download bool

Whether to download the data for the specified split. Note that the download will be executed only by additionally calling the :meth:prepare_data method and if the data does not yet exist on disk.

False
transforms Callable | None

A function/transform which returns a transformed version of the raw data samples.

None
Source code in src/eva/vision/data/datasets/classification/crc.py
def __init__(
    self,
    root: str,
    split: Literal["train", "val"],
    download: bool = False,
    transforms: Callable | None = None,
) -> None:
    """Initializes the dataset.

    The dataset is split into a train (train) and validation (val) set:
      - train: A set of 100,000 non-overlapping image patches from
        hematoxylin & eosin (H&E) stained histological images of human
        colorectal cancer (CRC) and normal tissue.
      - val: A set of 7180 image patches from N=50 patients with colorectal
        adenocarcinoma (no overlap with patients in NCT-CRC-HE-100K).

    Args:
        root: Path to the root directory of the dataset.
        split: Dataset split to use.
        download: Whether to download the data for the specified split.
            Note that the download will be executed only by additionally
            calling the :meth:`prepare_data` method and if the data does
            not yet exist on disk.
        transforms: A function/transform which returns a transformed
            version of the raw data samples.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split
    self._download = download

    self._samples: List[Tuple[str, int]] = []

eva.vision.data.datasets.GleasonArvaniti

Bases: VisionDataset[Image, Tensor]

Dataset class for GleasonArvaniti images and corresponding targets.

Parameters:

Name Type Description Default
root str

Path to the root directory of the dataset.

required
split Literal['train', 'val', 'test'] | None

Dataset split to use. If None, the entire dataset is used.

None
transforms Callable | None

A function/transform which returns a transformed version of the raw data samples.

None
Source code in src/eva/vision/data/datasets/classification/gleason_arvaniti.py
def __init__(
    self,
    root: str,
    split: Literal["train", "val", "test"] | None = None,
    transforms: Callable | None = None,
) -> None:
    """Initialize the dataset.

    Args:
        root: Path to the root directory of the dataset.
        split: Dataset split to use. If `None`, the entire dataset is used.
        transforms: A function/transform which returns a transformed
            version of the raw data samples.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split

    self._indices: List[int] = []

eva.vision.data.datasets.MHIST

Bases: VisionDataset[Image, Tensor]

MHIST dataset.

Parameters:

Name Type Description Default
root str

Path to the root directory of the dataset.

required
split Literal['train', 'test']

Dataset split to use.

required
transforms Callable | None

A function/transform which returns a transformed version of the raw data samples.

None
Source code in src/eva/vision/data/datasets/classification/mhist.py
def __init__(
    self,
    root: str,
    split: Literal["train", "test"],
    transforms: Callable | None = None,
) -> None:
    """Initialize the dataset.

    Args:
        root: Path to the root directory of the dataset.
        split: Dataset split to use.
        transforms: A function/transform which returns a transformed
            version of the raw data samples.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split

    self._samples: List[Tuple[str, str]] = []

eva.vision.data.datasets.PANDA

Bases: MultiWsiDataset, VisionDataset[Image, Tensor]

Dataset class for PANDA images and corresponding targets.

Parameters:

Name Type Description Default
root str

Root directory of the dataset.

required
sampler Sampler

The sampler to use for sampling patch coordinates.

required
split Literal['train', 'val', 'test'] | None

Dataset split to use. If None, the entire dataset is used.

None
width int

Width of the patches to be extracted, in pixels.

224
height int

Height of the patches to be extracted, in pixels.

224
target_mpp float

Target microns per pixel (mpp) for the patches.

0.5
backend str

The backend to use for reading the whole-slide images.

'openslide'
image_transforms Callable | None

Transforms to apply to the extracted image patches.

None
coords_path str | None

File path to save the patch coordinates as .csv.

None
seed int

Random seed for reproducibility.

42
Source code in src/eva/vision/data/datasets/classification/panda.py
def __init__(
    self,
    root: str,
    sampler: samplers.Sampler,
    split: Literal["train", "val", "test"] | None = None,
    width: int = 224,
    height: int = 224,
    target_mpp: float = 0.5,
    backend: str = "openslide",
    image_transforms: Callable | None = None,
    coords_path: str | None = None,
    seed: int = 42,
) -> None:
    """Initializes the dataset.

    Args:
        root: Root directory of the dataset.
        sampler: The sampler to use for sampling patch coordinates.
        split: Dataset split to use. If `None`, the entire dataset is used.
        width: Width of the patches to be extracted, in pixels.
        height: Height of the patches to be extracted, in pixels.
        target_mpp: Target microns per pixel (mpp) for the patches.
        backend: The backend to use for reading the whole-slide images.
        image_transforms: Transforms to apply to the extracted image patches.
        coords_path: File path to save the patch coordinates as .csv.
        seed: Random seed for reproducibility.
    """
    self._split = split
    self._root = root
    self._seed = seed

    self._download_resources()

    wsi.MultiWsiDataset.__init__(
        self,
        root=root,
        file_paths=self._load_file_paths(split),
        width=width,
        height=height,
        sampler=sampler,
        target_mpp=target_mpp,
        backend=backend,
        image_transforms=image_transforms,
        coords_path=coords_path,
    )

annotations: pd.DataFrame cached property

Loads the dataset labels.

eva.vision.data.datasets.PANDASmall

Bases: PANDA

Small version of the PANDA dataset for quicker benchmarking.

Parameters:

Name Type Description Default
root str

Root directory of the dataset.

required
sampler Sampler

The sampler to use for sampling patch coordinates.

required
split Literal['train', 'val', 'test'] | None

Dataset split to use. If None, the entire dataset is used.

None
width int

Width of the patches to be extracted, in pixels.

224
height int

Height of the patches to be extracted, in pixels.

224
target_mpp float

Target microns per pixel (mpp) for the patches.

0.5
backend str

The backend to use for reading the whole-slide images.

'openslide'
image_transforms Callable | None

Transforms to apply to the extracted image patches.

None
coords_path str | None

File path to save the patch coordinates as .csv.

None
seed int

Random seed for reproducibility.

42
Source code in src/eva/vision/data/datasets/classification/panda.py
def __init__(
    self,
    root: str,
    sampler: samplers.Sampler,
    split: Literal["train", "val", "test"] | None = None,
    width: int = 224,
    height: int = 224,
    target_mpp: float = 0.5,
    backend: str = "openslide",
    image_transforms: Callable | None = None,
    coords_path: str | None = None,
    seed: int = 42,
) -> None:
    """Initializes the dataset.

    Args:
        root: Root directory of the dataset.
        sampler: The sampler to use for sampling patch coordinates.
        split: Dataset split to use. If `None`, the entire dataset is used.
        width: Width of the patches to be extracted, in pixels.
        height: Height of the patches to be extracted, in pixels.
        target_mpp: Target microns per pixel (mpp) for the patches.
        backend: The backend to use for reading the whole-slide images.
        image_transforms: Transforms to apply to the extracted image patches.
        coords_path: File path to save the patch coordinates as .csv.
        seed: Random seed for reproducibility.
    """
    self._split = split
    self._root = root
    self._seed = seed

    self._download_resources()

    wsi.MultiWsiDataset.__init__(
        self,
        root=root,
        file_paths=self._load_file_paths(split),
        width=width,
        height=height,
        sampler=sampler,
        target_mpp=target_mpp,
        backend=backend,
        image_transforms=image_transforms,
        coords_path=coords_path,
    )

eva.vision.data.datasets.PatchCamelyon

Bases: VisionDataset[Image, Tensor]

Dataset class for PatchCamelyon images and corresponding targets.

Parameters:

Name Type Description Default
root str

The path to the dataset root. This path should contain the uncompressed h5 files and the metadata.

required
split Literal['train', 'val', 'test']

The dataset split for training, validation, or testing.

required
download bool

Whether to download the data for the specified split. Note that the download will be executed only by additionally calling the :meth:prepare_data method.

False
transforms Callable | None

A function/transform which returns a transformed version of the raw data samples.

None
Source code in src/eva/vision/data/datasets/classification/patch_camelyon.py
def __init__(
    self,
    root: str,
    split: Literal["train", "val", "test"],
    download: bool = False,
    transforms: Callable | None = None,
) -> None:
    """Initializes the dataset.

    Args:
        root: The path to the dataset root. This path should contain
            the uncompressed h5 files and the metadata.
        split: The dataset split for training, validation, or testing.
        download: Whether to download the data for the specified split.
            Note that the download will be executed only by additionally
            calling the :meth:`prepare_data` method.
        transforms: A function/transform which returns a transformed
            version of the raw data samples.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split
    self._download = download

eva.vision.data.datasets.UniToPatho

Bases: VisionDataset[Image, Tensor]

Dataset class for UniToPatho images and corresponding targets.

The dataset is split into train and validation by taking into account the patient IDs to avoid any data leakage.

Parameters:

Name Type Description Default
root str

Path to the root directory of the dataset.

required
split Literal['train', 'val'] | None

Dataset split to use. If None, the entire dataset is used.

None
transforms Callable | None

A function/transform which returns a transformed version of the raw data samples.

None
Source code in src/eva/vision/data/datasets/classification/unitopatho.py
def __init__(
    self,
    root: str,
    split: Literal["train", "val"] | None = None,
    transforms: Callable | None = None,
) -> None:
    """Initialize the dataset.

    The dataset is split into train and validation by taking into account
    the patient IDs to avoid any data leakage.

    Args:
        root: Path to the root directory of the dataset.
        split: Dataset split to use. If `None`, the entire dataset is used.
        transforms: A function/transform which returns a transformed
            version of the raw data samples.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split

    self._indices: List[int] = []

eva.vision.data.datasets.WsiClassificationDataset

Bases: MultiWsiDataset, VisionDataset[Image, Tensor]

A general dataset class for whole-slide image classification using manifest files.

Parameters:

Name Type Description Default
root str

Root directory of the dataset.

required
manifest_file str

The path to the manifest file, relative to the root argument. The path column is expected to contain relative paths to the whole-slide images.

required
width int

Width of the patches to be extracted, in pixels.

required
height int

Height of the patches to be extracted, in pixels.

required
target_mpp float

Target microns per pixel (mpp) for the patches.

required
sampler Sampler

The sampler to use for sampling patch coordinates.

required
backend str

The backend to use for reading the whole-slide images.

'openslide'
split Literal['train', 'val', 'test'] | None

The split of the dataset to load.

None
image_transforms Callable | None

Transforms to apply to the extracted image patches.

None
column_mapping Dict[str, str]

Mapping of the columns in the manifest file.

default_column_mapping
coords_path str | None

File path to save the patch coordinates as .csv.

None
Source code in src/eva/vision/data/datasets/classification/wsi.py
def __init__(
    self,
    root: str,
    manifest_file: str,
    width: int,
    height: int,
    target_mpp: float,
    sampler: samplers.Sampler,
    backend: str = "openslide",
    split: Literal["train", "val", "test"] | None = None,
    image_transforms: Callable | None = None,
    column_mapping: Dict[str, str] = default_column_mapping,
    coords_path: str | None = None,
):
    """Initializes the dataset.

    Args:
        root: Root directory of the dataset.
        manifest_file: The path to the manifest file, relative to
            the `root` argument. The `path` column is expected to contain
            relative paths to the whole-slide images.
        width: Width of the patches to be extracted, in pixels.
        height: Height of the patches to be extracted, in pixels.
        target_mpp: Target microns per pixel (mpp) for the patches.
        sampler: The sampler to use for sampling patch coordinates.
        backend: The backend to use for reading the whole-slide images.
        split: The split of the dataset to load.
        image_transforms: Transforms to apply to the extracted image patches.
        column_mapping: Mapping of the columns in the manifest file.
        coords_path: File path to save the patch coordinates as .csv.
    """
    self._split = split
    self._column_mapping = self.default_column_mapping | column_mapping
    self._manifest = self._load_manifest(os.path.join(root, manifest_file))

    wsi.MultiWsiDataset.__init__(
        self,
        root=root,
        file_paths=self._manifest[self._column_mapping["path"]].tolist(),
        width=width,
        height=height,
        sampler=sampler,
        target_mpp=target_mpp,
        backend=backend,
        image_transforms=image_transforms,
        coords_path=coords_path,
    )

Segmentation datasets

eva.vision.data.datasets.BCSS

Bases: MultiWsiDataset, VisionDataset[Image, Mask]

Dataset class for BCSS semantic segmentation task.

Source: https://github.com/PathologyDataScience/BCSS

We apply the the class grouping proposed by the challenge baseline: https://bcsegmentation.grand-challenge.org/Baseline/

outside_roi: outside_roi tumor: angioinvasion, dcis stroma: stroma inflammatory: lymphocytic_infiltrate, plasma_cells, other_immune_infiltrate necrosis: necrosis_or_debris other: remaining

Be aware that outside_roi should be assigned zero-weight during model training.

Parameters:

Name Type Description Default
root str

Root directory of the dataset.

required
sampler Sampler

The sampler to use for sampling patch coordinates. If None, it will use the ::class::GridSampler sampler.

required
split Literal['train', 'val', 'trainval', 'test'] | None

Dataset split to use. If None, the entire dataset is used.

None
width int

Width of the patches to be extracted, in pixels.

224
height int

Height of the patches to be extracted, in pixels.

224
target_mpp float

Target microns per pixel (mpp) for the patches.

0.5
transforms Callable | None

Transforms to apply to the extracted image & mask patches.

None
Source code in src/eva/vision/data/datasets/segmentation/bcss.py
def __init__(
    self,
    root: str,
    sampler: samplers.Sampler,
    split: Literal["train", "val", "trainval", "test"] | None = None,
    width: int = 224,
    height: int = 224,
    target_mpp: float = 0.5,
    transforms: Callable | None = None,
) -> None:
    """Initializes the dataset.

    Args:
        root: Root directory of the dataset.
        sampler: The sampler to use for sampling patch coordinates.
            If `None`, it will use the ::class::`GridSampler` sampler.
        split: Dataset split to use. If `None`, the entire dataset is used.
        width: Width of the patches to be extracted, in pixels.
        height: Height of the patches to be extracted, in pixels.
        target_mpp: Target microns per pixel (mpp) for the patches.
        transforms: Transforms to apply to the extracted image & mask patches.
    """
    self._split = split
    self._root = root

    self.datasets: List[wsi.WsiDataset]  # type: ignore

    wsi.MultiWsiDataset.__init__(
        self,
        root=root,
        file_paths=self._load_file_paths(split),
        width=width,
        height=height,
        sampler=sampler or samplers.GridSampler(max_samples=1000),
        target_mpp=target_mpp,
        overwrite_mpp=0.25,
        backend="pil",
    )
    vision.VisionDataset.__init__(self, transforms=transforms)

eva.vision.data.datasets.BTCV

Bases: VisionDataset[Volume, Mask]

Beyond the Cranial Vault (BTCV) Abdomen dataset.

The BTCV dataset comprises abdominal CT acquired at the Vanderbilt University Medical Center from metastatic liver cancer patients or post-operative ventral hernia patients. The dataset contains one background class and thirteen organ classes.

More info
  • Multi-organ Abdominal CT Reference Standard Segmentations https://zenodo.org/records/1169361
  • Dataset Split https://github.com/Luffy03/Large-Scale-Medical/blob/main/Downstream/monai/BTCV/dataset/dataset_0.json

Parameters:

Name Type Description Default
root str

Path to the dataset root directory.

required
split Literal['train', 'val'] | None

Dataset split to use ('train' or 'val'). If None, it uses the full dataset.

None
download bool

Whether to download the dataset.

False
transforms Callable | None

A callable object for applying data transformations. If None, no transformations are applied.

None
Source code in src/eva/vision/data/datasets/segmentation/btcv.py
def __init__(
    self,
    root: str,
    split: Literal["train", "val"] | None = None,
    download: bool = False,
    transforms: Callable | None = None,
) -> None:
    """Initializes the dataset.

    Args:
        root: Path to the dataset root directory.
        split: Dataset split to use ('train' or 'val').
            If None, it uses the full dataset.
        download: Whether to download the dataset.
        transforms: A callable object for applying data transformations.
            If None, no transformations are applied.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split
    self._download = download

    self._samples: List[Tuple[str, str]]
    self._indices: List[int]

load_data

Loads the CT volume for a given sample.

Parameters:

Name Type Description Default
index int

The index of the desired sample.

required

Returns:

Type Description
Volume

Tensor representing the CT volume of shape [T, C, H, W].

Source code in src/eva/vision/data/datasets/segmentation/btcv.py
@override
def load_data(self, index: int) -> eva_tv_tensors.Volume:
    """Loads the CT volume for a given sample.

    Args:
        index: The index of the desired sample.

    Returns:
        Tensor representing the CT volume of shape `[T, C, H, W]`.
    """
    ct_scan_file, _ = self._samples[self._indices[index]]
    return _utils.load_volume_tensor(ct_scan_file)

load_target

Loads the segmentation mask for a given sample.

Parameters:

Name Type Description Default
index int

The index of the desired sample.

required

Returns:

Type Description
Mask

Tensor representing the segmentation mask of shape [T, C, H, W].

Source code in src/eva/vision/data/datasets/segmentation/btcv.py
@override
def load_target(self, index: int) -> tv_tensors.Mask:
    """Loads the segmentation mask for a given sample.

    Args:
        index: The index of the desired sample.

    Returns:
        Tensor representing the segmentation mask of shape `[T, C, H, W]`.
    """
    ct_scan_file, mask_file = self._samples[self._indices[index]]
    return _utils.load_mask_tensor(mask_file, ct_scan_file)

eva.vision.data.datasets.CoNSeP

Bases: MultiWsiDataset, VisionDataset[Image, Mask]

Dataset class for CoNSeP semantic segmentation task.

As in [1], we combine classes 3 (healthy epithelial) & 4 (dysplastic/malignant epithelial) into the epithelial class and 5 (fibroblast), 6 (muscle) & 7 (endothelial) into the spindle-shaped class.

[1] Graham, Simon, et al. "Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images." https://arxiv.org/abs/1802.04712

Parameters:

Name Type Description Default
root str

Root directory of the dataset.

required
sampler Sampler | None

The sampler to use for sampling patch coordinates. If None, it will use the ::class::ForegroundGridSampler sampler.

None
split Literal['train', 'val'] | None

Dataset split to use. If None, the entire dataset is used.

None
width int

Width of the patches to be extracted, in pixels.

250
height int

Height of the patches to be extracted, in pixels.

250
target_mpp float

Target microns per pixel (mpp) for the patches.

0.25
transforms Callable | None

Transforms to apply to the extracted image & mask patches.

None
Source code in src/eva/vision/data/datasets/segmentation/consep.py
def __init__(
    self,
    root: str,
    sampler: samplers.Sampler | None = None,
    split: Literal["train", "val"] | None = None,
    width: int = 250,
    height: int = 250,
    target_mpp: float = 0.25,
    transforms: Callable | None = None,
) -> None:
    """Initializes the dataset.

    Args:
        root: Root directory of the dataset.
        sampler: The sampler to use for sampling patch coordinates.
            If `None`, it will use the ::class::`ForegroundGridSampler` sampler.
        split: Dataset split to use. If `None`, the entire dataset is used.
        width: Width of the patches to be extracted, in pixels.
        height: Height of the patches to be extracted, in pixels.
        target_mpp: Target microns per pixel (mpp) for the patches.
        transforms: Transforms to apply to the extracted image & mask patches.
    """
    self._split = split
    self._root = root

    self.datasets: List[wsi.WsiDataset]  # type: ignore

    wsi.MultiWsiDataset.__init__(
        self,
        root=root,
        file_paths=self._load_file_paths(split),
        width=width,
        height=height,
        sampler=sampler or samplers.ForegroundGridSampler(max_samples=25),
        target_mpp=target_mpp,
        overwrite_mpp=0.25,
        backend="pil",
        image_transforms=transforms,
    )

eva.vision.data.datasets.LiTS

Bases: VisionDataset[Image, Mask]

LiTS - Liver Tumor Segmentation Challenge.

Webpage: https://competitions.codalab.org/competitions/17094

Parameters:

Name Type Description Default
root str

Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist.

required
split Literal['train', 'val', 'test'] | None

Dataset split to use.

None
transforms Callable | None

A function/transforms that takes in an image and a target mask and returns the transformed versions of both.

None
seed int

Seed used for generating the dataset splits.

8
Source code in src/eva/vision/data/datasets/segmentation/lits.py
def __init__(
    self,
    root: str,
    split: Literal["train", "val", "test"] | None = None,
    transforms: Callable | None = None,
    seed: int = 8,
) -> None:
    """Initialize dataset.

    Args:
        root: Path to the root directory of the dataset. The dataset will
            be downloaded and extracted here, if it does not already exist.
        split: Dataset split to use.
        transforms: A function/transforms that takes in an image and a target
            mask and returns the transformed versions of both.
        seed: Seed used for generating the dataset splits.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split
    self._seed = seed
    self._indices: List[Tuple[int, int]] = []

eva.vision.data.datasets.LiTSBalanced

Bases: LiTS

Balanced version of the LiTS - Liver Tumor Segmentation Challenge dataset.

For each volume in the dataset, we sample the same number of slices where only the liver and where both liver and tumor are present.

Webpage: https://competitions.codalab.org/competitions/17094

For the splits we follow: https://arxiv.org/pdf/2010.01663v2

Parameters:

Name Type Description Default
root str

Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist.

required
split Literal['train', 'val', 'test'] | None

Dataset split to use.

None
transforms Callable | None

A function/transforms that takes in an image and a target mask and returns the transformed versions of both.

None
seed int

Seed used for generating the dataset splits and sampling of the slices.

8
Source code in src/eva/vision/data/datasets/segmentation/lits_balanced.py
def __init__(
    self,
    root: str,
    split: Literal["train", "val", "test"] | None = None,
    transforms: Callable | None = None,
    seed: int = 8,
) -> None:
    """Initialize dataset.

    Args:
        root: Path to the root directory of the dataset. The dataset will
            be downloaded and extracted here, if it does not already exist.
        split: Dataset split to use.
        transforms: A function/transforms that takes in an image and a target
            mask and returns the transformed versions of both.
        seed: Seed used for generating the dataset splits and sampling of the slices.
    """
    super().__init__(root=root, split=split, transforms=transforms, seed=seed)

eva.vision.data.datasets.MoNuSAC

Bases: VisionDataset[Image, Mask]

MoNuSAC2020: A Multi-organ Nuclei Segmentation and Classification Challenge.

Webpage: https://monusac-2020.grand-challenge.org/

Parameters:

Name Type Description Default
root str

Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist.

required
split Literal['train', 'test']

Dataset split to use.

required
export_masks bool

Whether to export, save and use the semantic label masks from disk.

True
download bool

Whether to download the data for the specified split. Note that the download will be executed only by additionally calling the :meth:prepare_data method and if the data does not exist yet on disk.

False
transforms Callable | None

A function/transforms that takes in an image and a target mask and returns the transformed versions of both.

None
Source code in src/eva/vision/data/datasets/segmentation/monusac.py
def __init__(
    self,
    root: str,
    split: Literal["train", "test"],
    export_masks: bool = True,
    download: bool = False,
    transforms: Callable | None = None,
) -> None:
    """Initialize dataset.

    Args:
        root: Path to the root directory of the dataset. The dataset will
            be downloaded and extracted here, if it does not already exist.
        split: Dataset split to use.
        export_masks: Whether to export, save and use the semantic label masks
            from disk.
        download: Whether to download the data for the specified split.
            Note that the download will be executed only by additionally
            calling the :meth:`prepare_data` method and if the data does not
            exist yet on disk.
        transforms: A function/transforms that takes in an image and a target
            mask and returns the transformed versions of both.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split
    self._export_masks = export_masks
    self._download = download

eva.vision.data.datasets.TotalSegmentator2D

Bases: VisionDataset[Image, Mask]

TotalSegmentator 2D segmentation dataset.

Parameters:

Name Type Description Default
root str

Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist.

required
split Literal['train', 'val', 'test'] | None

Dataset split to use. If None, the entire dataset is used.

required
version Literal['small', 'full'] | None

The version of the dataset to initialize. If None, it will use the files located at root as is and wont perform any checks.

'full'
download bool

Whether to download the data for the specified split. Note that the download will be executed only by additionally calling the :meth:prepare_data method and if the data does not exist yet on disk.

False
classes List[str] | None

Whether to configure the dataset with a subset of classes. If None, it will use all of them.

None
class_mappings Dict[str, str] | None

A dictionary that maps the original class names to a reduced set of classes. If None, it will use the original classes.

reduced_class_mappings
optimize_mask_loading bool

Whether to pre-process the segmentation masks in order to optimize the loading time. In the setup method, it will reformat the binary one-hot masks to a semantic mask and store it on disk.

True
decompress bool

Whether to decompress the ct.nii.gz files when preparing the data. The label masks won't be decompressed, but when enabling optimize_mask_loading it will export the semantic label masks to a single file in uncompressed .nii format.

True
num_workers int

The number of workers to use for optimizing the masks & decompressing the .gz files.

10
transforms Callable | None

A function/transforms that takes in an image and a target mask and returns the transformed versions of both.

None
Source code in src/eva/vision/data/datasets/segmentation/total_segmentator_2d.py
def __init__(
    self,
    root: str,
    split: Literal["train", "val", "test"] | None,
    version: Literal["small", "full"] | None = "full",
    download: bool = False,
    classes: List[str] | None = None,
    class_mappings: Dict[str, str] | None = _total_segmentator.reduced_class_mappings,
    optimize_mask_loading: bool = True,
    decompress: bool = True,
    num_workers: int = 10,
    transforms: Callable | None = None,
) -> None:
    """Initialize dataset.

    Args:
        root: Path to the root directory of the dataset. The dataset will
            be downloaded and extracted here, if it does not already exist.
        split: Dataset split to use. If `None`, the entire dataset is used.
        version: The version of the dataset to initialize. If `None`, it will
            use the files located at root as is and wont perform any checks.
        download: Whether to download the data for the specified split.
            Note that the download will be executed only by additionally
            calling the :meth:`prepare_data` method and if the data does not
            exist yet on disk.
        classes: Whether to configure the dataset with a subset of classes.
            If `None`, it will use all of them.
        class_mappings: A dictionary that maps the original class names to a
            reduced set of classes. If `None`, it will use the original classes.
        optimize_mask_loading: Whether to pre-process the segmentation masks
            in order to optimize the loading time. In the `setup` method, it
            will reformat the binary one-hot masks to a semantic mask and store
            it on disk.
        decompress: Whether to decompress the ct.nii.gz files when preparing the data.
            The label masks won't be decompressed, but when enabling optimize_mask_loading
            it will export the semantic label masks to a single file in uncompressed .nii
            format.
        num_workers: The number of workers to use for optimizing the masks &
            decompressing the .gz files.
        transforms: A function/transforms that takes in an image and a target
            mask and returns the transformed versions of both.

    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split
    self._version = version
    self._download = download
    self._classes = classes
    self._optimize_mask_loading = optimize_mask_loading
    self._decompress = decompress
    self._num_workers = num_workers
    self._class_mappings = class_mappings

    if self._classes and self._class_mappings:
        raise ValueError("Both 'classes' and 'class_mappings' cannot be set at the same time.")

    self._samples_dirs: List[str] = []
    self._indices: List[Tuple[int, int]] = []

eva.vision.data.datasets.EmbeddingsSegmentationDataset

Bases: EmbeddingsDataset[Mask]

Embeddings segmentation dataset.

Expects a manifest file listing the paths of .pt files that contain tensor embeddings of shape [embedding_dim] or [1, embedding_dim].

Parameters:

Name Type Description Default
root str

Root directory of the dataset.

required
manifest_file str

The path to the manifest file, which is relative to the root argument.

required
split Literal['train', 'val', 'test'] | None

The dataset split to use. The split column of the manifest file will be splitted based on this value.

None
column_mapping Dict[str, str]

Defines the map between the variables and the manifest columns. It will overwrite the default_column_mapping with the provided values, so that column_mapping can contain only the values which are altered or missing.

default_column_mapping
embeddings_transforms Callable | None

A function/transform that transforms the embedding.

None
target_transforms Callable | None

A function/transform that transforms the target.

None
Source code in src/eva/core/data/datasets/embeddings.py
def __init__(
    self,
    root: str,
    manifest_file: str,
    split: Literal["train", "val", "test"] | None = None,
    column_mapping: Dict[str, str] = default_column_mapping,
    embeddings_transforms: Callable | None = None,
    target_transforms: Callable | None = None,
) -> None:
    """Initialize dataset.

    Expects a manifest file listing the paths of .pt files that contain
    tensor embeddings of shape [embedding_dim] or [1, embedding_dim].

    Args:
        root: Root directory of the dataset.
        manifest_file: The path to the manifest file, which is relative to
            the `root` argument.
        split: The dataset split to use. The `split` column of the manifest
            file will be splitted based on this value.
        column_mapping: Defines the map between the variables and the manifest
            columns. It will overwrite the `default_column_mapping` with
            the provided values, so that `column_mapping` can contain only the
            values which are altered or missing.
        embeddings_transforms: A function/transform that transforms the embedding.
        target_transforms: A function/transform that transforms the target.
    """
    super().__init__()

    self._root = root
    self._manifest_file = manifest_file
    self._split = split
    self._column_mapping = default_column_mapping | column_mapping
    self._embeddings_transforms = embeddings_transforms
    self._target_transforms = target_transforms

    self._data: pd.DataFrame

    self._set_multiprocessing_start_method()