Skip to content

Datasets

VisionDataset

eva.vision.data.datasets.VisionDataset

Bases: Dataset, ABC, Generic[DataSample]

Base dataset class for vision tasks.

filename abstractmethod

Returns the filename of the index'th data sample.

Note that this is the relative file path to the root.

Parameters:

Name Type Description Default
index int

The index of the data-sample to select.

required

Returns:

Type Description
str

The filename of the index'th data sample.

Source code in src/eva/vision/data/datasets/vision.py
@abc.abstractmethod
def filename(self, index: int) -> str:
    """Returns the filename of the `index`'th data sample.

    Note that this is the relative file path to the root.

    Args:
        index: The index of the data-sample to select.

    Returns:
        The filename of the `index`'th data sample.
    """

Classification datasets

eva.vision.data.datasets.BACH

Bases: ImageClassification

Dataset class for BACH images and corresponding targets.

The dataset is split into train and validation by taking into account the patient IDs to avoid any data leakage.

Parameters:

Name Type Description Default
root str

Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist.

required
split Literal['train', 'val'] | None

Dataset split to use. If None, the entire dataset is used.

None
download bool

Whether to download the data for the specified split. Note that the download will be executed only by additionally calling the :meth:prepare_data method and if the data does not yet exist on disk.

False
transforms Callable | None

A function/transform which returns a transformed version of the raw data samples.

None
Source code in src/eva/vision/data/datasets/classification/bach.py
def __init__(
    self,
    root: str,
    split: Literal["train", "val"] | None = None,
    download: bool = False,
    transforms: Callable | None = None,
) -> None:
    """Initialize the dataset.

    The dataset is split into train and validation by taking into account
    the patient IDs to avoid any data leakage.

    Args:
        root: Path to the root directory of the dataset. The dataset will
            be downloaded and extracted here, if it does not already exist.
        split: Dataset split to use. If `None`, the entire dataset is used.
        download: Whether to download the data for the specified split.
            Note that the download will be executed only by additionally
            calling the :meth:`prepare_data` method and if the data does
            not yet exist on disk.
        transforms: A function/transform which returns a transformed
            version of the raw data samples.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split
    self._download = download

    self._samples: List[Tuple[str, int]] = []
    self._indices: List[int] = []

eva.vision.data.datasets.PatchCamelyon

Bases: ImageClassification

Dataset class for PatchCamelyon images and corresponding targets.

Parameters:

Name Type Description Default
root str

The path to the dataset root. This path should contain the uncompressed h5 files and the metadata.

required
split Literal['train', 'val', 'test']

The dataset split for training, validation, or testing.

required
download bool

Whether to download the data for the specified split. Note that the download will be executed only by additionally calling the :meth:prepare_data method.

False
transforms Callable | None

A function/transform which returns a transformed version of the raw data samples.

None
Source code in src/eva/vision/data/datasets/classification/patch_camelyon.py
def __init__(
    self,
    root: str,
    split: Literal["train", "val", "test"],
    download: bool = False,
    transforms: Callable | None = None,
) -> None:
    """Initializes the dataset.

    Args:
        root: The path to the dataset root. This path should contain
            the uncompressed h5 files and the metadata.
        split: The dataset split for training, validation, or testing.
        download: Whether to download the data for the specified split.
            Note that the download will be executed only by additionally
            calling the :meth:`prepare_data` method.
        transforms: A function/transform which returns a transformed
            version of the raw data samples.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split
    self._download = download

Segmentation datasets

eva.vision.data.datasets.ImageSegmentation

Bases: VisionDataset[Tuple[Image, Mask]], ABC

Image segmentation abstract dataset.

Parameters:

Name Type Description Default
transforms Callable | None

A function/transforms that takes in an image and a label and returns the transformed versions of both.

None
Source code in src/eva/vision/data/datasets/segmentation/base.py
def __init__(self, transforms: Callable | None = None) -> None:
    """Initializes the image segmentation base class.

    Args:
        transforms: A function/transforms that takes in an
            image and a label and returns the transformed versions of both.
    """
    super().__init__()

    self._transforms = transforms

classes: List[str] | None property

Returns the list with names of the dataset names.

class_to_idx: Dict[str, int] | None property

Returns a mapping of the class name to its target index.

load_image abstractmethod

Loads and returns the index'th image sample.

Parameters:

Name Type Description Default
index int

The index of the data sample to load.

required

Returns:

Type Description
Image

An image torchvision tensor (channels, height, width).

Source code in src/eva/vision/data/datasets/segmentation/base.py
@abc.abstractmethod
def load_image(self, index: int) -> tv_tensors.Image:
    """Loads and returns the `index`'th image sample.

    Args:
        index: The index of the data sample to load.

    Returns:
        An image torchvision tensor (channels, height, width).
    """

load_mask abstractmethod

Returns the index'th target masks sample.

Parameters:

Name Type Description Default
index int

The index of the data sample target masks to load.

required

Returns:

Type Description
Mask

The semantic mask as a (H x W) shaped tensor with integer

Mask

values which represent the pixel class id.

Source code in src/eva/vision/data/datasets/segmentation/base.py
@abc.abstractmethod
def load_mask(self, index: int) -> tv_tensors.Mask:
    """Returns the `index`'th target masks sample.

    Args:
        index: The index of the data sample target masks to load.

    Returns:
        The semantic mask as a (H x W) shaped tensor with integer
        values which represent the pixel class id.
    """

load_metadata

Returns the dataset metadata.

Parameters:

Name Type Description Default
index int

The index of the data sample to return the metadata of. If None, it will return the metadata of the current dataset.

required

Returns:

Type Description
Dict[str, Any] | None

The sample metadata.

Source code in src/eva/vision/data/datasets/segmentation/base.py
def load_metadata(self, index: int) -> Dict[str, Any] | None:
    """Returns the dataset metadata.

    Args:
        index: The index of the data sample to return the metadata of.
            If `None`, it will return the metadata of the current dataset.

    Returns:
        The sample metadata.
    """

eva.vision.data.datasets.TotalSegmentator2D

Bases: ImageSegmentation

TotalSegmentator 2D segmentation dataset.

Parameters:

Name Type Description Default
root str

Path to the root directory of the dataset. The dataset will be downloaded and extracted here, if it does not already exist.

required
split Literal['train', 'val', 'test'] | None

Dataset split to use. If None, the entire dataset is used.

required
version Literal['small', 'full'] | None

The version of the dataset to initialize. If None, it will use the files located at root as is and wont perform any checks.

'full'
download bool

Whether to download the data for the specified split. Note that the download will be executed only by additionally calling the :meth:prepare_data method and if the data does not exist yet on disk.

False
classes List[str] | None

Whether to configure the dataset with a subset of classes. If None, it will use all of them.

None
optimize_mask_loading bool

Whether to pre-process the segmentation masks in order to optimize the loading time. In the setup method, it will reformat the binary one-hot masks to a semantic mask and store it on disk.

True
transforms Callable | None

A function/transforms that takes in an image and a target mask and returns the transformed versions of both.

None
Source code in src/eva/vision/data/datasets/segmentation/total_segmentator_2d.py
def __init__(
    self,
    root: str,
    split: Literal["train", "val", "test"] | None,
    version: Literal["small", "full"] | None = "full",
    download: bool = False,
    classes: List[str] | None = None,
    optimize_mask_loading: bool = True,
    transforms: Callable | None = None,
) -> None:
    """Initialize dataset.

    Args:
        root: Path to the root directory of the dataset. The dataset will
            be downloaded and extracted here, if it does not already exist.
        split: Dataset split to use. If `None`, the entire dataset is used.
        version: The version of the dataset to initialize. If `None`, it will
            use the files located at root as is and wont perform any checks.
        download: Whether to download the data for the specified split.
            Note that the download will be executed only by additionally
            calling the :meth:`prepare_data` method and if the data does not
            exist yet on disk.
        classes: Whether to configure the dataset with a subset of classes.
            If `None`, it will use all of them.
        optimize_mask_loading: Whether to pre-process the segmentation masks
            in order to optimize the loading time. In the `setup` method, it
            will reformat the binary one-hot masks to a semantic mask and store
            it on disk.
        transforms: A function/transforms that takes in an image and a target
            mask and returns the transformed versions of both.
    """
    super().__init__(transforms=transforms)

    self._root = root
    self._split = split
    self._version = version
    self._download = download
    self._classes = classes
    self._optimize_mask_loading = optimize_mask_loading

    if self._optimize_mask_loading and self._classes is not None:
        raise ValueError(
            "To use customize classes please set the optimize_mask_loading to `False`."
        )

    self._samples_dirs: List[str] = []
    self._indices: List[Tuple[int, int]] = []