Callbacks

Writers

`eva.core.callbacks.writers.ClassificationEmbeddingsWriter`

Bases: EmbeddingsWriter

Callback for writing generated embeddings to disk for classification tasks.

This callback writes the embedding files in a separate process to avoid blocking the main process where the model forward pass is executed.

Parameters:

Name	Type	Description	Default
`output_dir`	`str`	The directory where the embeddings will be saved.	required
`backbone`	`Module \| None`	A model to be used as feature extractor. If `None`, it will be expected that the input batch returns the features directly.	`None`
`dataloader_idx_map`	`Dict[int, str] \| None`	A dictionary mapping dataloader indices to their respective names (e.g. train, val, test).	`None`
`metadata_keys`	`List[str] \| None`	An optional list of keys to extract from the batch metadata and store as additional columns in the manifest file.	`None`
`overwrite`	`bool`	Whether to overwrite if embeddings are already present in the specified output directory. If set to `False`, an error will be raised if embeddings are already present (recommended).	`False`
`save_every_n`	`int`	Interval for number of iterations to save the embeddings to disk. During this interval, the embeddings are accumulated in memory.	`100`

Source code in src/eva/core/callbacks/writers/embeddings/base.py

def __init__(
    self,
    output_dir: str,
    backbone: nn.Module | None = None,
    dataloader_idx_map: Dict[int, str] | None = None,
    metadata_keys: List[str] | None = None,
    overwrite: bool = False,
    save_every_n: int = 100,
) -> None:
    """Initializes a new EmbeddingsWriter instance.

    This callback writes the embedding files in a separate process to avoid blocking the
    main process where the model forward pass is executed.

    Args:
        output_dir: The directory where the embeddings will be saved.
        backbone: A model to be used as feature extractor. If `None`,
            it will be expected that the input batch returns the features directly.
        dataloader_idx_map: A dictionary mapping dataloader indices to their respective
            names (e.g. train, val, test).
        metadata_keys: An optional list of keys to extract from the batch metadata and store
            as additional columns in the manifest file.
        overwrite: Whether to overwrite if embeddings are already present in the specified
            output directory. If set to `False`, an error will be raised if embeddings are
            already present (recommended).
        save_every_n: Interval for number of iterations to save the embeddings to disk.
            During this interval, the embeddings are accumulated in memory.
    """
    super().__init__(write_interval="batch")

    self._output_dir = output_dir
    self._backbone = backbone
    self._dataloader_idx_map = dataloader_idx_map or {}
    self._overwrite = overwrite
    self._save_every_n = save_every_n
    self._metadata_keys = metadata_keys or []

    self._write_queue: multiprocessing.Queue
    self._write_process: eva_multiprocessing.Process