Datasets
Reference information for the Dataset
base class.
eva.data.Dataset
Bases: TorchDataset
Base dataset class.
prepare_data
Encapsulates all disk related tasks.
This method is preferred for downloading and preparing the data, for
example generate manifest files. If implemented, it will be called via
:class:eva.core.data.datamodules.DataModule
, which ensures that is called
only within a single process, making it multi-processes safe.
Source code in src/eva/core/data/datasets/base.py
setup
Setups the dataset.
This method is preferred for creating datasets or performing
train/val/test splits. If implemented, it will be called via
:class:eva.core.data.datamodules.DataModule
at the beginning of fit
(train + validate), validate, test, or predict and it will be called
from every process (i.e. GPU) across all the nodes in DDP.
Source code in src/eva/core/data/datasets/base.py
configure
Configures the dataset.
This method is preferred to configure the dataset; assign values
to attributes, perform splits etc. This would be called from the
method ::method::setup
, before calling the ::method::validate
.
Source code in src/eva/core/data/datasets/base.py
validate
Validates the dataset.
This method aims to check the integrity of the dataset and verify
that is configured properly. This would be called from the method
::method::setup
, after calling the ::method::configure
.
Source code in src/eva/core/data/datasets/base.py
teardown
Cleans up the data artifacts.
Used to clean-up when the run is finished. If implemented, it will
be called via :class:eva.core.data.datamodules.DataModule
at the end
of fit (train + validate), validate, test, or predict and it will be
called from every process (i.e. GPU) across all the nodes in DDP.
Source code in src/eva/core/data/datasets/base.py
Embeddings datasets
eva.core.data.datasets.EmbeddingsClassificationDataset
Bases: Dataset
Embeddings classification dataset.
Expects a manifest file listing the paths of .pt files that contain tensor embeddings of shape [embedding_dim] or [1, embedding_dim].
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str
|
Root directory of the dataset. |
required |
manifest_file |
str
|
The path to the manifest file, which is relative to
the |
required |
split |
str | None
|
The dataset split to use. The |
None
|
column_mapping |
Dict[str, str]
|
Defines the map between the variables and the manifest
columns. It will overwrite the |
default_column_mapping
|
embeddings_transforms |
Callable | None
|
A function/transform that transforms the embedding. |
None
|
target_transforms |
Callable | None
|
A function/transform that transforms the target. |
None
|
Source code in src/eva/core/data/datasets/classification/embeddings.py
default_column_mapping: Dict[str, str] = {'data': 'embeddings', 'target': 'target', 'split': 'split'}
class-attribute
instance-attribute
The default column mapping of the variables to the manifest columns.
filename
Returns the filename of the index
'th data sample.
Note that this is the relative file path to the root.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
int
|
The index of the data-sample to select. |
required |
Returns:
Type | Description |
---|---|
str
|
The filename of the |