Skip to content

Callbacks

Writers

eva.core.callbacks.writers.ClassificationEmbeddingsWriter

Bases: EmbeddingsWriter

Callback for writing generated embeddings to disk for classification tasks.

This callback writes the embedding files in a separate process to avoid blocking the main process where the model forward pass is executed.

Parameters:

Name Type Description Default
output_dir str

The directory where the embeddings will be saved.

required
backbone Module | None

A model to be used as feature extractor. If None, it will be expected that the input batch returns the features directly.

None
dataloader_idx_map Dict[int, str] | None

A dictionary mapping dataloader indices to their respective names (e.g. train, val, test).

None
metadata_keys List[str] | None

An optional list of keys to extract from the batch metadata and store as additional columns in the manifest file.

None
overwrite bool

Whether to overwrite if embeddings are already present in the specified output directory. If set to False, an error will be raised if embeddings are already present (recommended).

False
save_every_n int

Interval for number of iterations to save the embeddings to disk. During this interval, the embeddings are accumulated in memory.

100
Source code in src/eva/core/callbacks/writers/embeddings/base.py
def __init__(
    self,
    output_dir: str,
    backbone: nn.Module | None = None,
    dataloader_idx_map: Dict[int, str] | None = None,
    metadata_keys: List[str] | None = None,
    overwrite: bool = False,
    save_every_n: int = 100,
) -> None:
    """Initializes a new EmbeddingsWriter instance.

    This callback writes the embedding files in a separate process to avoid blocking the
    main process where the model forward pass is executed.

    Args:
        output_dir: The directory where the embeddings will be saved.
        backbone: A model to be used as feature extractor. If `None`,
            it will be expected that the input batch returns the features directly.
        dataloader_idx_map: A dictionary mapping dataloader indices to their respective
            names (e.g. train, val, test).
        metadata_keys: An optional list of keys to extract from the batch metadata and store
            as additional columns in the manifest file.
        overwrite: Whether to overwrite if embeddings are already present in the specified
            output directory. If set to `False`, an error will be raised if embeddings are
            already present (recommended).
        save_every_n: Interval for number of iterations to save the embeddings to disk.
            During this interval, the embeddings are accumulated in memory.
    """
    super().__init__(write_interval="batch")

    self._output_dir = output_dir
    self._backbone = backbone
    self._dataloader_idx_map = dataloader_idx_map or {}
    self._overwrite = overwrite
    self._save_every_n = save_every_n
    self._metadata_keys = metadata_keys or []

    self._write_queue: multiprocessing.Queue
    self._write_process: eva_multiprocessing.Process