Datasets
Reference information for the Dataset
base class.
eva.core.data.Dataset
Bases: TorchDataset
Base dataset class.
prepare_data
Encapsulates all disk related tasks.
This method is preferred for downloading and preparing the data, for
example generate manifest files. If implemented, it will be called via
:class:eva.core.data.datamodules.DataModule
, which ensures that is called
only within a single process, making it multi-processes safe.
Source code in src/eva/core/data/datasets/base.py
setup
Setups the dataset.
This method is preferred for creating datasets or performing
train/val/test splits. If implemented, it will be called via
:class:eva.core.data.datamodules.DataModule
at the beginning of fit
(train + validate), validate, test, or predict and it will be called
from every process (i.e. GPU) across all the nodes in DDP.
Source code in src/eva/core/data/datasets/base.py
configure
Configures the dataset.
This method is preferred to configure the dataset; assign values
to attributes, perform splits etc. This would be called from the
method ::method::setup
, before calling the ::method::validate
.
Source code in src/eva/core/data/datasets/base.py
validate
Validates the dataset.
This method aims to check the integrity of the dataset and verify
that is configured properly. This would be called from the method
::method::setup
, after calling the ::method::configure
.
Source code in src/eva/core/data/datasets/base.py
teardown
Cleans up the data artifacts.
Used to clean-up when the run is finished. If implemented, it will
be called via :class:eva.core.data.datamodules.DataModule
at the end
of fit (train + validate), validate, test, or predict and it will be
called from every process (i.e. GPU) across all the nodes in DDP.
Source code in src/eva/core/data/datasets/base.py
Embeddings datasets
eva.core.data.datasets.EmbeddingsClassificationDataset
Bases: EmbeddingsDataset[Tensor]
Embeddings dataset class for classification tasks.
Expects a manifest file listing the paths of .pt files that contain tensor embeddings of shape [embedding_dim] or [1, embedding_dim].
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Root directory of the dataset. |
required |
manifest_file |
str
|
The path to the manifest file, which is relative to
the |
required |
split |
Literal['train', 'val', 'test'] | None
|
The dataset split to use. The |
None
|
column_mapping |
Dict[str, str]
|
Defines the map between the variables and the manifest
columns. It will overwrite the |
default_column_mapping
|
embeddings_transforms |
Callable | None
|
A function/transform that transforms the embedding. |
None
|
target_transforms |
Callable | None
|
A function/transform that transforms the target. |
None
|
Source code in src/eva/core/data/datasets/embeddings.py
eva.core.data.datasets.MultiEmbeddingsClassificationDataset
Bases: EmbeddingsDataset[Tensor]
Dataset class for where a sample corresponds to multiple embeddings.
Example use case: Slide level dataset where each slide has multiple patch embeddings.
Expects a manifest file listing the paths of .pt
files containing tensor embeddings.
The manifest must have a column_mapping["multi_id"]
column that contains the
unique identifier group of embeddings. For oncology datasets, this would be usually
the slide id. Each row in the manifest file points to a .pt file that can contain
one or multiple embeddings (either as a list or stacked tensors). There can also be
multiple rows for the same multi_id
, in which case the embeddings from the different
.pt files corresponding to that same multi_id
will be stacked along the first dimension.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Root directory of the dataset. |
required |
manifest_file |
str
|
The path to the manifest file, which is relative to
the |
required |
split |
Literal['train', 'val', 'test']
|
The dataset split to use. The |
required |
column_mapping |
Dict[str, str]
|
Defines the map between the variables and the manifest
columns. It will overwrite the |
default_column_mapping
|
embeddings_transforms |
Callable | None
|
A function/transform that transforms the embedding. |
None
|
target_transforms |
Callable | None
|
A function/transform that transforms the target. |
None
|