Skip to content

Callbacks

Writers

eva.core.callbacks.writers.EmbeddingsWriter

Bases: BasePredictionWriter

Callback for writing generated embeddings to disk.

This callback writes the embedding files in a separate process to avoid blocking the main process where the model forward pass is executed.

Parameters:

Name Type Description Default
output_dir str

The directory where the embeddings will be saved.

required
backbone Module | None

A model to be used as feature extractor. If None, it will be expected that the input batch returns the features directly.

None
dataloader_idx_map Dict[int, str] | None

A dictionary mapping dataloader indices to their respective names (e.g. train, val, test).

None
group_key str | None

The metadata key to group the embeddings by. If specified, the embedding files will be saved in subdirectories named after the group_key. If specified, the key must be present in the metadata of the input batch.

None
overwrite bool

Whether to overwrite the output directory. Defaults to True.

True
Source code in src/eva/core/callbacks/writers/embeddings.py
def __init__(
    self,
    output_dir: str,
    backbone: nn.Module | None = None,
    dataloader_idx_map: Dict[int, str] | None = None,
    group_key: str | None = None,
    overwrite: bool = True,
) -> None:
    """Initializes a new EmbeddingsWriter instance.

    This callback writes the embedding files in a separate process to avoid blocking the
    main process where the model forward pass is executed.

    Args:
        output_dir: The directory where the embeddings will be saved.
        backbone: A model to be used as feature extractor. If `None`,
            it will be expected that the input batch returns the features directly.
        dataloader_idx_map: A dictionary mapping dataloader indices to their respective
            names (e.g. train, val, test).
        group_key: The metadata key to group the embeddings by. If specified, the
            embedding files will be saved in subdirectories named after the group_key.
            If specified, the key must be present in the metadata of the input batch.
        overwrite: Whether to overwrite the output directory. Defaults to True.
    """
    super().__init__(write_interval="batch")

    self._output_dir = output_dir
    self._backbone = backbone
    self._dataloader_idx_map = dataloader_idx_map or {}
    self._group_key = group_key
    self._overwrite = overwrite

    self._write_queue: multiprocessing.Queue
    self._write_process: eva_multiprocessing.Process