Datasets

VisionDataset

`eva.vision.data.datasets.VisionDataset`

Bases: MapDataset[Tuple[InputType, TargetType, Dict[str, Any]]], ABC, Generic[InputType, TargetType]

Base dataset class for vision tasks.

Parameters:

Name	Type	Description	Default
`transforms`	`Callable \| None`	A function/transform which returns a transformed version of the raw data samples.	`None`

Source code in src/eva/vision/data/datasets/vision.py

def __init__(
    self,
    transforms: Callable | None = None,
) -> None:
    """Initializes the dataset.

    Args:
        transforms: A function/transform which returns a transformed
            version of the raw data samples.
    """
    super().__init__()

    self._transforms = transforms

`classes: List[str] | None` `property`

Returns the list with names of the dataset names.

`class_to_idx: Dict[str, int] | None` `property`

Returns a mapping of the class name to its target index.

`load_metadata`

Returns the dataset metadata.

Parameters:

Name	Type	Description	Default
`index`	`int`	The index of the data sample to return the metadata of.	required

Returns:

Type	Description
`Dict[str, Any] \| None`	The sample metadata.

Source code in src/eva/vision/data/datasets/vision.py

def load_metadata(self, index: int) -> Dict[str, Any] | None:
    """Returns the dataset metadata.

    Args:
        index: The index of the data sample to return the metadata of.

    Returns:
        The sample metadata.
    """

`load_data` `abstractmethod`

Returns the index'th data sample.

Parameters:

Name	Type	Description	Default
`index`	`int`	The index of the data sample to load.	required

Returns:

Type	Description
`InputType`	The sample data.

Source code in src/eva/vision/data/datasets/vision.py

@abc.abstractmethod
def load_data(self, index: int) -> InputType:
    """Returns the `index`'th data sample.

    Args:
        index: The index of the data sample to load.

    Returns:
        The sample data.
    """

`load_target` `abstractmethod`

Returns the index'th target sample.

Parameters:

Name	Type	Description	Default
`index`	`int`	The index of the data sample to load.	required

Returns:

Type	Description
`TargetType`	The sample target.

Source code in src/eva/vision/data/datasets/vision.py

@abc.abstractmethod
def load_target(self, index: int) -> TargetType:
    """Returns the `index`'th target sample.

    Args:
        index: The index of the data sample to load.

    Returns:
        The sample target.
    """

`filename` `abstractmethod`

Returns the filename of the index'th data sample.

Note that this is the relative file path to the root.

Parameters:

Name	Type	Description	Default
`index`	`int`	The index of the data-sample to select.	required

Returns:

Type	Description
`str`	The filename of the `index`'th data sample.

Source code in src/eva/vision/data/datasets/vision.py

@abc.abstractmethod
def filename(self, index: int) -> str:
    """Returns the filename of the `index`'th data sample.

    Note that this is the relative file path to the root.

    Args:
        index: The index of the data-sample to select.

    Returns:
        The filename of the `index`'th data sample.
    """

Classification datasets

`eva.vision.data.datasets.BACH`

Bases: VisionDataset[Image, Tensor]

Dataset class for BACH images and corresponding targets.

The dataset is split into train and validation by taking into account the patient IDs to avoid any data leakage.

Parameters:

Name	Type	Description	Default
`root`	`str`	Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist.	required
`split`	`Literal['train', 'val'] \| None`	Dataset split to use. If `None`, the entire dataset is used.	`None`
`download`	`bool`	Whether to download the data for the specified split. Note that the download will be executed only by additionally calling the :meth:`prepare_data` method and if the data does not yet exist on disk.	`False`
`transforms`	`Callable \| None`	A function/transform which returns a transformed version of the raw data samples.	`None`

Source code in src/eva/vision/data/datasets/classification/bach.py

def __init__(
    self,
    root: str,
    split: Literal["train", "val"] | None = None,
    download: bool = False,
    transforms: Callable | None = None,
) -> None:
    """Initialize the dataset.

    The dataset is split into train and validation by taking into account
    the patient IDs to avoid any data leakage.

    Args:
        root: Path to the root directory of the dataset. The dataset will
            be downloaded and extracted here, if it does not already exist.
        split: Dataset split to use. If `None`, the entire dataset is used.
        download: Whether to download the data for the specified split.
            Note that the download will be executed only by additionally
            calling the :meth:`prepare_data` method and if the data does
            not yet exist on disk.
        transforms: A function/transform which returns a transformed
            version of the raw data samples.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split
    self._download = download

    self._samples: List[Tuple[str, int]] = []
    self._indices: List[int] = []

`eva.vision.data.datasets.BRACS`

Bases: VisionDataset[Image, Tensor]

Dataset class for BRACS images and corresponding targets.

Parameters:

Name	Type	Description	Default
`root`	`str`	Path to the root directory of the dataset.	required
`split`	`Literal['train', 'val', 'test']`	Dataset split to use.	required
`transforms`	`Callable \| None`	A function/transform which returns a transformed version of the raw data samples.	`None`

Source code in src/eva/vision/data/datasets/classification/bracs.py

def __init__(
    self,
    root: str,
    split: Literal["train", "val", "test"],
    transforms: Callable | None = None,
) -> None:
    """Initializes the dataset.

    Args:
        root: Path to the root directory of the dataset.
        split: Dataset split to use.
        transforms: A function/transform which returns a transformed
            version of the raw data samples.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split

    self._samples: List[Tuple[str, int]] = []

`eva.vision.data.datasets.BreaKHis`

Bases: VisionDataset[Image, Tensor]

Dataset class for BreaKHis images and corresponding targets.

The dataset is split into train and validation by taking into account the patient IDs to avoid any data leakage.

Parameters:

Name	Type	Description	Default
`root`	`str`	Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist.	required
`split`	`Literal['train', 'val'] \| None`	Dataset split to use. If `None`, the entire dataset is used.	`None`
`magnifications`	`List[Literal['40X', '100X', '200X', '400X']] \| None`	A list of the WSI magnifications to select. By default only 40X images are used.	`None`
`download`	`bool`	Whether to download the data for the specified split. Note that the download will be executed only by additionally calling the :meth:`prepare_data` method and if the data does not yet exist on disk.	`False`
`transforms`	`Callable \| None`	A function/transform which returns a transformed version of the raw data samples.	`None`

Source code in src/eva/vision/data/datasets/classification/breakhis.py

def __init__(
    self,
    root: str,
    split: Literal["train", "val"] | None = None,
    magnifications: List[Literal["40X", "100X", "200X", "400X"]] | None = None,
    download: bool = False,
    transforms: Callable | None = None,
) -> None:
    """Initialize the dataset.

    The dataset is split into train and validation by taking into account
    the patient IDs to avoid any data leakage.

    Args:
        root: Path to the root directory of the dataset. The dataset will
            be downloaded and extracted here, if it does not already exist.
        split: Dataset split to use. If `None`, the entire dataset is used.
        magnifications: A list of the WSI magnifications to select. By default
            only 40X images are used.
        download: Whether to download the data for the specified split.
            Note that the download will be executed only by additionally
            calling the :meth:`prepare_data` method and if the data does
            not yet exist on disk.
        transforms: A function/transform which returns a transformed
            version of the raw data samples.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split
    self._download = download

    self._magnifications = magnifications or self._default_magnifications
    self._indices: List[int] = []

`eva.vision.data.datasets.Camelyon16`

Bases: MultiWsiDataset, VisionDataset[Image, Tensor]

Dataset class for Camelyon16 images and corresponding targets.

Parameters:

Name	Type	Description	Default
`root`	`str`	Root directory of the dataset.	required
`sampler`	`Sampler`	The sampler to use for sampling patch coordinates.	required
`split`	`Literal['train', 'val', 'test'] \| None`	Dataset split to use. If `None`, the entire dataset is used.	`None`
`width`	`int`	Width of the patches to be extracted, in pixels.	`224`
`height`	`int`	Height of the patches to be extracted, in pixels.	`224`
`target_mpp`	`float`	Target microns per pixel (mpp) for the patches.	`0.5`
`backend`	`str`	The backend to use for reading the whole-slide images.	`'openslide'`
`image_transforms`	`Callable \| None`	Transforms to apply to the extracted image patches.	`None`
`coords_path`	`str \| None`	File path to save the patch coordinates as .csv.	`None`
`seed`	`int`	Random seed for reproducibility.	`42`

Source code in src/eva/vision/data/datasets/classification/camelyon16.py

def __init__(
    self,
    root: str,
    sampler: samplers.Sampler,
    split: Literal["train", "val", "test"] | None = None,
    width: int = 224,
    height: int = 224,
    target_mpp: float = 0.5,
    backend: str = "openslide",
    image_transforms: Callable | None = None,
    coords_path: str | None = None,
    seed: int = 42,
) -> None:
    """Initializes the dataset.

    Args:
        root: Root directory of the dataset.
        sampler: The sampler to use for sampling patch coordinates.
        split: Dataset split to use. If `None`, the entire dataset is used.
        width: Width of the patches to be extracted, in pixels.
        height: Height of the patches to be extracted, in pixels.
        target_mpp: Target microns per pixel (mpp) for the patches.
        backend: The backend to use for reading the whole-slide images.
        image_transforms: Transforms to apply to the extracted image patches.
        coords_path: File path to save the patch coordinates as .csv.
        seed: Random seed for reproducibility.
    """
    self._split = split
    self._root = root
    self._width = width
    self._height = height
    self._target_mpp = target_mpp
    self._seed = seed

    wsi.MultiWsiDataset.__init__(
        self,
        root=root,
        file_paths=self._load_file_paths(split),
        width=width,
        height=height,
        sampler=sampler,
        target_mpp=target_mpp,
        backend=backend,
        image_transforms=image_transforms,
        coords_path=coords_path,
    )

`annotations_test_set: Dict[str, str]` `cached` `property`

Loads the dataset labels.

`annotations: Dict[str, str]` `cached` `property`

Loads the dataset labels.

`eva.vision.data.datasets.CRC`

Bases: VisionDataset[Image, Tensor]

Dataset class for CRC images and corresponding targets.

The dataset is split into a train (train) and validation (val) set: - train: A set of 100,000 non-overlapping image patches from hematoxylin & eosin (H&E) stained histological images of human colorectal cancer (CRC) and normal tissue. - val: A set of 7180 image patches from N=50 patients with colorectal adenocarcinoma (no overlap with patients in NCT-CRC-HE-100K).

Parameters:

Name	Type	Description	Default
`root`	`str`	Path to the root directory of the dataset.	required
`split`	`Literal['train', 'val']`	Dataset split to use.	required
`download`	`bool`	Whether to download the data for the specified split. Note that the download will be executed only by additionally calling the :meth:`prepare_data` method and if the data does not yet exist on disk.	`False`
`transforms`	`Callable \| None`	A function/transform which returns a transformed version of the raw data samples.	`None`

Source code in src/eva/vision/data/datasets/classification/crc.py

def __init__(
    self,
    root: str,
    split: Literal["train", "val"],
    download: bool = False,
    transforms: Callable | None = None,
) -> None:
    """Initializes the dataset.

    The dataset is split into a train (train) and validation (val) set:
      - train: A set of 100,000 non-overlapping image patches from
        hematoxylin & eosin (H&E) stained histological images of human
        colorectal cancer (CRC) and normal tissue.
      - val: A set of 7180 image patches from N=50 patients with colorectal
        adenocarcinoma (no overlap with patients in NCT-CRC-HE-100K).

    Args:
        root: Path to the root directory of the dataset.
        split: Dataset split to use.
        download: Whether to download the data for the specified split.
            Note that the download will be executed only by additionally
            calling the :meth:`prepare_data` method and if the data does
            not yet exist on disk.
        transforms: A function/transform which returns a transformed
            version of the raw data samples.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split
    self._download = download

    self._samples: List[Tuple[str, int]] = []

`eva.vision.data.datasets.GleasonArvaniti`

Bases: VisionDataset[Image, Tensor]

Dataset class for GleasonArvaniti images and corresponding targets.

Parameters:

Name	Type	Description	Default
`root`	`str`	Path to the root directory of the dataset.	required
`split`	`Literal['train', 'val', 'test'] \| None`	Dataset split to use. If `None`, the entire dataset is used.	`None`
`transforms`	`Callable \| None`	A function/transform which returns a transformed version of the raw data samples.	`None`

Source code in src/eva/vision/data/datasets/classification/gleason_arvaniti.py

def __init__(
    self,
    root: str,
    split: Literal["train", "val", "test"] | None = None,
    transforms: Callable | None = None,
) -> None:
    """Initialize the dataset.

    Args:
        root: Path to the root directory of the dataset.
        split: Dataset split to use. If `None`, the entire dataset is used.
        transforms: A function/transform which returns a transformed
            version of the raw data samples.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split

    self._indices: List[int] = []

`eva.vision.data.datasets.MHIST`

Bases: VisionDataset[Image, Tensor]

MHIST dataset.

Parameters:

Name	Type	Description	Default
`root`	`str`	Path to the root directory of the dataset.	required
`split`	`Literal['train', 'test']`	Dataset split to use.	required
`transforms`	`Callable \| None`	A function/transform which returns a transformed version of the raw data samples.	`None`

Source code in src/eva/vision/data/datasets/classification/mhist.py

def __init__(
    self,
    root: str,
    split: Literal["train", "test"],
    transforms: Callable | None = None,
) -> None:
    """Initialize the dataset.

    Args:
        root: Path to the root directory of the dataset.
        split: Dataset split to use.
        transforms: A function/transform which returns a transformed
            version of the raw data samples.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split

    self._samples: List[Tuple[str, str]] = []

`eva.vision.data.datasets.PANDA`

Bases: MultiWsiDataset, VisionDataset[Image, Tensor]

Dataset class for PANDA images and corresponding targets.

Parameters:

Name	Type	Description	Default
`root`	`str`	Root directory of the dataset.	required
`sampler`	`Sampler`	The sampler to use for sampling patch coordinates.	required
`split`	`Literal['train', 'val', 'test'] \| None`	Dataset split to use. If `None`, the entire dataset is used.	`None`
`width`	`int`	Width of the patches to be extracted, in pixels.	`224`
`height`	`int`	Height of the patches to be extracted, in pixels.	`224`
`target_mpp`	`float`	Target microns per pixel (mpp) for the patches.	`0.5`
`backend`	`str`	The backend to use for reading the whole-slide images.	`'openslide'`
`image_transforms`	`Callable \| None`	Transforms to apply to the extracted image patches.	`None`
`coords_path`	`str \| None`	File path to save the patch coordinates as .csv.	`None`
`seed`	`int`	Random seed for reproducibility.	`42`

Source code in src/eva/vision/data/datasets/classification/panda.py

def __init__(
    self,
    root: str,
    sampler: samplers.Sampler,
    split: Literal["train", "val", "test"] | None = None,
    width: int = 224,
    height: int = 224,
    target_mpp: float = 0.5,
    backend: str = "openslide",
    image_transforms: Callable | None = None,
    coords_path: str | None = None,
    seed: int = 42,
) -> None:
    """Initializes the dataset.

    Args:
        root: Root directory of the dataset.
        sampler: The sampler to use for sampling patch coordinates.
        split: Dataset split to use. If `None`, the entire dataset is used.
        width: Width of the patches to be extracted, in pixels.
        height: Height of the patches to be extracted, in pixels.
        target_mpp: Target microns per pixel (mpp) for the patches.
        backend: The backend to use for reading the whole-slide images.
        image_transforms: Transforms to apply to the extracted image patches.
        coords_path: File path to save the patch coordinates as .csv.
        seed: Random seed for reproducibility.
    """
    self._split = split
    self._root = root
    self._seed = seed

    self._download_resources()

    wsi.MultiWsiDataset.__init__(
        self,
        root=root,
        file_paths=self._load_file_paths(split),
        width=width,
        height=height,
        sampler=sampler,
        target_mpp=target_mpp,
        backend=backend,
        image_transforms=image_transforms,
        coords_path=coords_path,
    )

`annotations: pd.DataFrame` `cached` `property`

Loads the dataset labels.

`eva.vision.data.datasets.PANDASmall`

Bases: PANDA

Small version of the PANDA dataset for quicker benchmarking.

Parameters:

Name	Type	Description	Default
`root`	`str`	Root directory of the dataset.	required
`sampler`	`Sampler`	The sampler to use for sampling patch coordinates.	required
`split`	`Literal['train', 'val', 'test'] \| None`	Dataset split to use. If `None`, the entire dataset is used.	`None`
`width`	`int`	Width of the patches to be extracted, in pixels.	`224`
`height`	`int`	Height of the patches to be extracted, in pixels.	`224`
`target_mpp`	`float`	Target microns per pixel (mpp) for the patches.	`0.5`
`backend`	`str`	The backend to use for reading the whole-slide images.	`'openslide'`
`image_transforms`	`Callable \| None`	Transforms to apply to the extracted image patches.	`None`
`coords_path`	`str \| None`	File path to save the patch coordinates as .csv.	`None`
`seed`	`int`	Random seed for reproducibility.	`42`

Source code in src/eva/vision/data/datasets/classification/panda.py

def __init__(
    self,
    root: str,
    sampler: samplers.Sampler,
    split: Literal["train", "val", "test"] | None = None,
    width: int = 224,
    height: int = 224,
    target_mpp: float = 0.5,
    backend: str = "openslide",
    image_transforms: Callable | None = None,
    coords_path: str | None = None,
    seed: int = 42,
) -> None:
    """Initializes the dataset.

    Args:
        root: Root directory of the dataset.
        sampler: The sampler to use for sampling patch coordinates.
        split: Dataset split to use. If `None`, the entire dataset is used.
        width: Width of the patches to be extracted, in pixels.
        height: Height of the patches to be extracted, in pixels.
        target_mpp: Target microns per pixel (mpp) for the patches.
        backend: The backend to use for reading the whole-slide images.
        image_transforms: Transforms to apply to the extracted image patches.
        coords_path: File path to save the patch coordinates as .csv.
        seed: Random seed for reproducibility.
    """
    self._split = split
    self._root = root
    self._seed = seed

    self._download_resources()

    wsi.MultiWsiDataset.__init__(
        self,
        root=root,
        file_paths=self._load_file_paths(split),
        width=width,
        height=height,
        sampler=sampler,
        target_mpp=target_mpp,
        backend=backend,
        image_transforms=image_transforms,
        coords_path=coords_path,
    )

`eva.vision.data.datasets.PatchCamelyon`

Bases: VisionDataset[Image, Tensor]

Dataset class for PatchCamelyon images and corresponding targets.

Parameters:

Name	Type	Description	Default
`root`	`str`	The path to the dataset root. This path should contain the uncompressed h5 files and the metadata.	required
`split`	`Literal['train', 'val', 'test']`	The dataset split for training, validation, or testing.	required
`download`	`bool`	Whether to download the data for the specified split. Note that the download will be executed only by additionally calling the :meth:`prepare_data` method.	`False`
`transforms`	`Callable \| None`	A function/transform which returns a transformed version of the raw data samples.	`None`

Source code in src/eva/vision/data/datasets/classification/patch_camelyon.py

def __init__(
    self,
    root: str,
    split: Literal["train", "val", "test"],
    download: bool = False,
    transforms: Callable | None = None,
) -> None:
    """Initializes the dataset.

    Args:
        root: The path to the dataset root. This path should contain
            the uncompressed h5 files and the metadata.
        split: The dataset split for training, validation, or testing.
        download: Whether to download the data for the specified split.
            Note that the download will be executed only by additionally
            calling the :meth:`prepare_data` method.
        transforms: A function/transform which returns a transformed
            version of the raw data samples.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split
    self._download = download

`eva.vision.data.datasets.UniToPatho`

Bases: VisionDataset[Image, Tensor]

Dataset class for UniToPatho images and corresponding targets.

The dataset is split into train and validation by taking into account the patient IDs to avoid any data leakage.

Parameters:

Name	Type	Description	Default
`root`	`str`	Path to the root directory of the dataset.	required
`split`	`Literal['train', 'val'] \| None`	Dataset split to use. If `None`, the entire dataset is used.	`None`
`transforms`	`Callable \| None`	A function/transform which returns a transformed version of the raw data samples.	`None`

Source code in src/eva/vision/data/datasets/classification/unitopatho.py

def __init__(
    self,
    root: str,
    split: Literal["train", "val"] | None = None,
    transforms: Callable | None = None,
) -> None:
    """Initialize the dataset.

    The dataset is split into train and validation by taking into account
    the patient IDs to avoid any data leakage.

    Args:
        root: Path to the root directory of the dataset.
        split: Dataset split to use. If `None`, the entire dataset is used.
        transforms: A function/transform which returns a transformed
            version of the raw data samples.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split

    self._indices: List[int] = []

`eva.vision.data.datasets.WsiClassificationDataset`

Bases: MultiWsiDataset, VisionDataset[Image, Tensor]

A general dataset class for whole-slide image classification using manifest files.

Parameters:

Name	Type	Description	Default
`root`	`str`	Root directory of the dataset.	required
`manifest_file`	`str`	The path to the manifest file, relative to the `root` argument. The `path` column is expected to contain relative paths to the whole-slide images.	required
`width`	`int`	Width of the patches to be extracted, in pixels.	required
`height`	`int`	Height of the patches to be extracted, in pixels.	required
`target_mpp`	`float`	Target microns per pixel (mpp) for the patches.	required
`sampler`	`Sampler`	The sampler to use for sampling patch coordinates.	required
`backend`	`str`	The backend to use for reading the whole-slide images.	`'openslide'`
`split`	`Literal['train', 'val', 'test'] \| None`	The split of the dataset to load.	`None`
`image_transforms`	`Callable \| None`	Transforms to apply to the extracted image patches.	`None`
`column_mapping`	`Dict[str, str]`	Mapping of the columns in the manifest file.	`default_column_mapping`
`coords_path`	`str \| None`	File path to save the patch coordinates as .csv.	`None`

Source code in src/eva/vision/data/datasets/classification/wsi.py

def __init__(
    self,
    root: str,
    manifest_file: str,
    width: int,
    height: int,
    target_mpp: float,
    sampler: samplers.Sampler,
    backend: str = "openslide",
    split: Literal["train", "val", "test"] | None = None,
    image_transforms: Callable | None = None,
    column_mapping: Dict[str, str] = default_column_mapping,
    coords_path: str | None = None,
):
    """Initializes the dataset.

    Args:
        root: Root directory of the dataset.
        manifest_file: The path to the manifest file, relative to
            the `root` argument. The `path` column is expected to contain
            relative paths to the whole-slide images.
        width: Width of the patches to be extracted, in pixels.
        height: Height of the patches to be extracted, in pixels.
        target_mpp: Target microns per pixel (mpp) for the patches.
        sampler: The sampler to use for sampling patch coordinates.
        backend: The backend to use for reading the whole-slide images.
        split: The split of the dataset to load.
        image_transforms: Transforms to apply to the extracted image patches.
        column_mapping: Mapping of the columns in the manifest file.
        coords_path: File path to save the patch coordinates as .csv.
    """
    self._split = split
    self._column_mapping = self.default_column_mapping | column_mapping
    self._manifest = self._load_manifest(os.path.join(root, manifest_file))

    wsi.MultiWsiDataset.__init__(
        self,
        root=root,
        file_paths=self._manifest[self._column_mapping["path"]].tolist(),
        width=width,
        height=height,
        sampler=sampler,
        target_mpp=target_mpp,
        backend=backend,
        image_transforms=image_transforms,
        coords_path=coords_path,
    )

Segmentation datasets

`eva.vision.data.datasets.BCSS`

Bases: MultiWsiDataset, VisionDataset[Image, Mask]

Dataset class for BCSS semantic segmentation task.

Source: https://github.com/PathologyDataScience/BCSS

We apply the the class grouping proposed by the challenge baseline: https://bcsegmentation.grand-challenge.org/Baseline/

outside_roi: outside_roi tumor: angioinvasion, dcis stroma: stroma inflammatory: lymphocytic_infiltrate, plasma_cells, other_immune_infiltrate necrosis: necrosis_or_debris other: remaining

Be aware that outside_roi should be assigned zero-weight during model training.

Parameters:

Name	Type	Description	Default
`root`	`str`	Root directory of the dataset.	required
`sampler`	`Sampler`	The sampler to use for sampling patch coordinates. If `None`, it will use the ::class::`GridSampler` sampler.	required
`split`	`Literal['train', 'val', 'trainval', 'test'] \| None`	Dataset split to use. If `None`, the entire dataset is used.	`None`
`width`	`int`	Width of the patches to be extracted, in pixels.	`224`
`height`	`int`	Height of the patches to be extracted, in pixels.	`224`
`target_mpp`	`float`	Target microns per pixel (mpp) for the patches.	`0.5`
`transforms`	`Callable \| None`	Transforms to apply to the extracted image & mask patches.	`None`

Source code in src/eva/vision/data/datasets/segmentation/bcss.py

def __init__(
    self,
    root: str,
    sampler: samplers.Sampler,
    split: Literal["train", "val", "trainval", "test"] | None = None,
    width: int = 224,
    height: int = 224,
    target_mpp: float = 0.5,
    transforms: Callable | None = None,
) -> None:
    """Initializes the dataset.

    Args:
        root: Root directory of the dataset.
        sampler: The sampler to use for sampling patch coordinates.
            If `None`, it will use the ::class::`GridSampler` sampler.
        split: Dataset split to use. If `None`, the entire dataset is used.
        width: Width of the patches to be extracted, in pixels.
        height: Height of the patches to be extracted, in pixels.
        target_mpp: Target microns per pixel (mpp) for the patches.
        transforms: Transforms to apply to the extracted image & mask patches.
    """
    self._split = split
    self._root = root

    self.datasets: List[wsi.WsiDataset]  # type: ignore

    wsi.MultiWsiDataset.__init__(
        self,
        root=root,
        file_paths=self._load_file_paths(split),
        width=width,
        height=height,
        sampler=sampler or samplers.GridSampler(max_samples=1000),
        target_mpp=target_mpp,
        overwrite_mpp=0.25,
        backend="pil",
    )
    vision.VisionDataset.__init__(self, transforms=transforms)

`eva.vision.data.datasets.BTCV`

Bases: VisionDataset[Volume, Mask]

Beyond the Cranial Vault (BTCV) Abdomen dataset.

The BTCV dataset comprises abdominal CT acquired at the Vanderbilt University Medical Center from metastatic liver cancer patients or post-operative ventral hernia patients. The dataset contains one background class and thirteen organ classes.

More info

Multi-organ Abdominal CT Reference Standard Segmentations https://zenodo.org/records/1169361
Dataset Split https://github.com/Luffy03/Large-Scale-Medical/blob/main/Downstream/monai/BTCV/dataset/dataset_0.json

Parameters:

Name	Type	Description	Default
`root`	`str`	Path to the dataset root directory.	required
`split`	`Literal['train', 'val'] \| None`	Dataset split to use ('train' or 'val'). If None, it uses the full dataset.	`None`
`download`	`bool`	Whether to download the dataset.	`False`
`transforms`	`Callable \| None`	A callable object for applying data transformations. If None, no transformations are applied.	`None`

Source code in src/eva/vision/data/datasets/segmentation/btcv.py

def __init__(
    self,
    root: str,
    split: Literal["train", "val"] | None = None,
    download: bool = False,
    transforms: Callable | None = None,
) -> None:
    """Initializes the dataset.

    Args:
        root: Path to the dataset root directory.
        split: Dataset split to use ('train' or 'val').
            If None, it uses the full dataset.
        download: Whether to download the dataset.
        transforms: A callable object for applying data transformations.
            If None, no transformations are applied.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split
    self._download = download

    self._samples: List[Tuple[str, str]]
    self._indices: List[int]

`load_data`

Loads the CT volume for a given sample.

Parameters:

Name	Type	Description	Default
`index`	`int`	The index of the desired sample.	required

Returns:

Type	Description
`Volume`	Tensor representing the CT volume of shape `[T, C, H, W]`.

Source code in src/eva/vision/data/datasets/segmentation/btcv.py

@override
def load_data(self, index: int) -> eva_tv_tensors.Volume:
    """Loads the CT volume for a given sample.

    Args:
        index: The index of the desired sample.

    Returns:
        Tensor representing the CT volume of shape `[T, C, H, W]`.
    """
    ct_scan_file, _ = self._samples[self._indices[index]]
    return _utils.load_volume_tensor(ct_scan_file)

`load_target`

Loads the segmentation mask for a given sample.

Parameters:

Name	Type	Description	Default
`index`	`int`	The index of the desired sample.	required

Returns:

Type	Description
`Mask`	Tensor representing the segmentation mask of shape `[T, C, H, W]`.

Source code in src/eva/vision/data/datasets/segmentation/btcv.py

@override
def load_target(self, index: int) -> tv_tensors.Mask:
    """Loads the segmentation mask for a given sample.

    Args:
        index: The index of the desired sample.

    Returns:
        Tensor representing the segmentation mask of shape `[T, C, H, W]`.
    """
    ct_scan_file, mask_file = self._samples[self._indices[index]]
    return _utils.load_mask_tensor(mask_file, ct_scan_file)

`eva.vision.data.datasets.CoNSeP`

Bases: MultiWsiDataset, VisionDataset[Image, Mask]

Dataset class for CoNSeP semantic segmentation task.

As in [1], we combine classes 3 (healthy epithelial) & 4 (dysplastic/malignant epithelial) into the epithelial class and 5 (fibroblast), 6 (muscle) & 7 (endothelial) into the spindle-shaped class.

[1] Graham, Simon, et al. "Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images." https://arxiv.org/abs/1802.04712

Parameters:

Name	Type	Description	Default
`root`	`str`	Root directory of the dataset.	required
`sampler`	`Sampler \| None`	The sampler to use for sampling patch coordinates. If `None`, it will use the ::class::`ForegroundGridSampler` sampler.	`None`
`split`	`Literal['train', 'val'] \| None`	Dataset split to use. If `None`, the entire dataset is used.	`None`
`width`	`int`	Width of the patches to be extracted, in pixels.	`250`
`height`	`int`	Height of the patches to be extracted, in pixels.	`250`
`target_mpp`	`float`	Target microns per pixel (mpp) for the patches.	`0.25`
`transforms`	`Callable \| None`	Transforms to apply to the extracted image & mask patches.	`None`

Source code in src/eva/vision/data/datasets/segmentation/consep.py

def __init__(
    self,
    root: str,
    sampler: samplers.Sampler | None = None,
    split: Literal["train", "val"] | None = None,
    width: int = 250,
    height: int = 250,
    target_mpp: float = 0.25,
    transforms: Callable | None = None,
) -> None:
    """Initializes the dataset.

    Args:
        root: Root directory of the dataset.
        sampler: The sampler to use for sampling patch coordinates.
            If `None`, it will use the ::class::`ForegroundGridSampler` sampler.
        split: Dataset split to use. If `None`, the entire dataset is used.
        width: Width of the patches to be extracted, in pixels.
        height: Height of the patches to be extracted, in pixels.
        target_mpp: Target microns per pixel (mpp) for the patches.
        transforms: Transforms to apply to the extracted image & mask patches.
    """
    self._split = split
    self._root = root

    self.datasets: List[wsi.WsiDataset]  # type: ignore

    wsi.MultiWsiDataset.__init__(
        self,
        root=root,
        file_paths=self._load_file_paths(split),
        width=width,
        height=height,
        sampler=sampler or samplers.ForegroundGridSampler(max_samples=25),
        target_mpp=target_mpp,
        overwrite_mpp=0.25,
        backend="pil",
        image_transforms=transforms,
    )

`eva.vision.data.datasets.LiTS17`

Bases: VisionDataset[Volume, Mask]

LiTS17 - Liver Tumor Segmentation Challenge 2017.

More info

The Liver Tumor Segmentation Benchmark (LiTS) https://arxiv.org/pdf/1901.04056
Dataset Split https://github.com/Luffy03/Large-Scale-Medical/blob/main/Downstream/monai/LiTs/dataset_lits.json
Data needs to be manually downloaded from: https://drive.google.com/drive/folders/0B0vscETPGI1-Q1h1WFdEM2FHSUE

Parameters:

Name	Type	Description	Default
`root`	`str`	Path to the dataset root directory.	required
`split`	`Literal['train', 'val'] \| None`	Dataset split to use ('train' or 'val'). If None, it uses the full dataset.	`None`
`transforms`	`Callable \| None`	A callable object for applying data transformations. If None, no transformations are applied.	`None`

Source code in src/eva/vision/data/datasets/segmentation/lits17.py

def __init__(
    self,
    root: str,
    split: Literal["train", "val"] | None = None,
    transforms: Callable | None = None,
) -> None:
    """Initializes the dataset.

    Args:
        root: Path to the dataset root directory.
        split: Dataset split to use ('train' or 'val').
            If None, it uses the full dataset.
        transforms: A callable object for applying data transformations.
            If None, no transformations are applied.
    """
    super().__init__()

    self._root = root
    self._split = split
    self._transforms = transforms

    self._samples: Dict[int, Tuple[str, str]]
    self._indices: List[int]

`load_data`

Loads the CT volume for a given sample.

Parameters:

Name	Type	Description	Default
`index`	`int`	The index of the desired sample.	required

Returns:

Type	Description
`Volume`	Tensor representing the CT volume of shape `[T, C, H, W]`.

Source code in src/eva/vision/data/datasets/segmentation/lits17.py

@override
def load_data(self, index: int) -> eva_tv_tensors.Volume:
    """Loads the CT volume for a given sample.

    Args:
        index: The index of the desired sample.

    Returns:
        Tensor representing the CT volume of shape `[T, C, H, W]`.
    """
    ct_scan_file, _ = self._samples[self._indices[index]]
    return _utils.load_volume_tensor(ct_scan_file)

`load_target`

Loads the segmentation mask for a given sample.

Parameters:

Name	Type	Description	Default
`index`	`int`	The index of the desired sample.	required

Returns:

Type	Description
`Mask`	Tensor representing the segmentation mask of shape `[T, C, H, W]`.

Source code in src/eva/vision/data/datasets/segmentation/lits17.py

@override
def load_target(self, index: int) -> tv_tensors.Mask:
    """Loads the segmentation mask for a given sample.

    Args:
        index: The index of the desired sample.

    Returns:
        Tensor representing the segmentation mask of shape `[T, C, H, W]`.
    """
    ct_scan_file, mask_file = self._samples[self._indices[index]]
    return _utils.load_mask_tensor(mask_file, ct_scan_file)

`eva.vision.data.datasets.MoNuSAC`

Bases: VisionDataset[Image, Mask]

MoNuSAC2020: A Multi-organ Nuclei Segmentation and Classification Challenge.

Webpage: https://monusac-2020.grand-challenge.org/

Parameters:

Name	Type	Description	Default
`root`	`str`	Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist.	required
`split`	`Literal['train', 'test']`	Dataset split to use.	required
`export_masks`	`bool`	Whether to export, save and use the semantic label masks from disk.	`True`
`download`	`bool`	Whether to download the data for the specified split. Note that the download will be executed only by additionally calling the :meth:`prepare_data` method and if the data does not exist yet on disk.	`False`
`transforms`	`Callable \| None`	A function/transforms that takes in an image and a target mask and returns the transformed versions of both.	`None`

Source code in src/eva/vision/data/datasets/segmentation/monusac.py

def __init__(
    self,
    root: str,
    split: Literal["train", "test"],
    export_masks: bool = True,
    download: bool = False,
    transforms: Callable | None = None,
) -> None:
    """Initialize dataset.

    Args:
        root: Path to the root directory of the dataset. The dataset will
            be downloaded and extracted here, if it does not already exist.
        split: Dataset split to use.
        export_masks: Whether to export, save and use the semantic label masks
            from disk.
        download: Whether to download the data for the specified split.
            Note that the download will be executed only by additionally
            calling the :meth:`prepare_data` method and if the data does not
            exist yet on disk.
        transforms: A function/transforms that takes in an image and a target
            mask and returns the transformed versions of both.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split
    self._export_masks = export_masks
    self._download = download

`eva.vision.data.datasets.EmbeddingsSegmentationDataset`

Bases: EmbeddingsDataset[Mask]

Embeddings segmentation dataset.

Expects a manifest file listing the paths of .pt files that contain tensor embeddings of shape [embedding_dim] or [1, embedding_dim].

Parameters:

Name	Type	Description	Default
`root`	`str`	Root directory of the dataset.	required
`manifest_file`	`str`	The path to the manifest file, which is relative to the `root` argument.	required
`split`	`Literal['train', 'val', 'test'] \| None`	The dataset split to use. The `split` column of the manifest file will be splitted based on this value.	`None`
`column_mapping`	`Dict[str, str]`	Defines the map between the variables and the manifest columns. It will overwrite the `default_column_mapping` with the provided values, so that `column_mapping` can contain only the values which are altered or missing.	`default_column_mapping`
`embeddings_transforms`	`Callable \| None`	A function/transform that transforms the embedding.	`None`
`target_transforms`	`Callable \| None`	A function/transform that transforms the target.	`None`

Source code in src/eva/core/data/datasets/embeddings.py

def __init__(
    self,
    root: str,
    manifest_file: str,
    split: Literal["train", "val", "test"] | None = None,
    column_mapping: Dict[str, str] = default_column_mapping,
    embeddings_transforms: Callable | None = None,
    target_transforms: Callable | None = None,
) -> None:
    """Initialize dataset.

    Expects a manifest file listing the paths of .pt files that contain
    tensor embeddings of shape [embedding_dim] or [1, embedding_dim].

    Args:
        root: Root directory of the dataset.
        manifest_file: The path to the manifest file, which is relative to
            the `root` argument.
        split: The dataset split to use. The `split` column of the manifest
            file will be splitted based on this value.
        column_mapping: Defines the map between the variables and the manifest
            columns. It will overwrite the `default_column_mapping` with
            the provided values, so that `column_mapping` can contain only the
            values which are altered or missing.
        embeddings_transforms: A function/transform that transforms the embedding.
        target_transforms: A function/transform that transforms the target.
    """
    super().__init__()

    self._root = root
    self._manifest_file = manifest_file
    self._split = split
    self._column_mapping = default_column_mapping | column_mapping
    self._embeddings_transforms = embeddings_transforms
    self._target_transforms = target_transforms

    self._data: pd.DataFrame

    self._set_multiprocessing_start_method()

Datasets

VisionDataset

eva.vision.data.datasets.VisionDataset

classes: List[str] | None property

class_to_idx: Dict[str, int] | None property

load_metadata

load_data abstractmethod

load_target abstractmethod

filename abstractmethod

Classification datasets

eva.vision.data.datasets.BACH

eva.vision.data.datasets.BRACS

eva.vision.data.datasets.BreaKHis

eva.vision.data.datasets.Camelyon16

annotations_test_set: Dict[str, str] cached property

annotations: Dict[str, str] cached property

eva.vision.data.datasets.CRC

eva.vision.data.datasets.GleasonArvaniti

eva.vision.data.datasets.MHIST

eva.vision.data.datasets.PANDA

annotations: pd.DataFrame cached property

eva.vision.data.datasets.PANDASmall

eva.vision.data.datasets.PatchCamelyon

eva.vision.data.datasets.UniToPatho

eva.vision.data.datasets.WsiClassificationDataset

Segmentation datasets

eva.vision.data.datasets.BCSS

eva.vision.data.datasets.BTCV

load_data

load_target

eva.vision.data.datasets.CoNSeP

eva.vision.data.datasets.LiTS17

load_data

load_target

eva.vision.data.datasets.MoNuSAC

eva.vision.data.datasets.EmbeddingsSegmentationDataset

`eva.vision.data.datasets.VisionDataset`

`classes: List[str] | None` `property`

`class_to_idx: Dict[str, int] | None` `property`

`load_metadata`

`load_data` `abstractmethod`

`load_target` `abstractmethod`

`filename` `abstractmethod`

`eva.vision.data.datasets.BACH`

`eva.vision.data.datasets.BRACS`

`eva.vision.data.datasets.BreaKHis`

`eva.vision.data.datasets.Camelyon16`

`annotations_test_set: Dict[str, str]` `cached` `property`

`annotations: Dict[str, str]` `cached` `property`

`eva.vision.data.datasets.CRC`

`eva.vision.data.datasets.GleasonArvaniti`

`eva.vision.data.datasets.MHIST`

`eva.vision.data.datasets.PANDA`

`annotations: pd.DataFrame` `cached` `property`

`eva.vision.data.datasets.PANDASmall`

`eva.vision.data.datasets.PatchCamelyon`

`eva.vision.data.datasets.UniToPatho`

`eva.vision.data.datasets.WsiClassificationDataset`

`eva.vision.data.datasets.BCSS`

`eva.vision.data.datasets.BTCV`

`load_data`

`load_target`

`eva.vision.data.datasets.CoNSeP`

`eva.vision.data.datasets.LiTS17`

`load_data`

`load_target`

`eva.vision.data.datasets.MoNuSAC`

`eva.vision.data.datasets.EmbeddingsSegmentationDataset`