Skip to content

How to use eva

Before starting to use eva, it's important to get familiar with the different workflows, subcommands and configurations.

eva subcommands

To run an evaluation, we call:

eva <subcommand> --config <path-to-config-file>

The eva interface supports the subcommands: predict, fit and predict_fit.

  • fit: is used to train a decoder for a specific task and subsequently evaluate the performance. This can be done online or offline *
  • predict: is used to compute embeddings for input images with a provided FM-checkpoint. This is the first step of the offline workflow
  • predict_fit: runs predict and fit sequentially. Like the fit-online run, it runs a complete evaluation with images as input.

* online vs. offline workflows

We distinguish between the online and offline workflow:

  • online: This mode uses raw images as input and generates the embeddings using a frozen FM backbone on the fly to train a downstream head network.
  • offline: In this mode, embeddings are pre-computed and stored locally in a first step, and loaded in a 2nd step from disk to train the downstream head network.

The online workflow can be used to quickly run a complete evaluation without saving and tracking embeddings. The offline workflow runs faster (only one FM-backbone forward pass) and is ideal to experiment with different decoders on the same FM-backbone.

Run configurations

Config files

The setup for an eva run is provided in a .yaml config file which is defined with the --config flag.

A config file specifies the setup for the trainer (including callback for the model backbone), the model (setup of the trainable decoder) and data module.

The config files for the datasets and models that eva supports out of the box, you can find on GitHub. We recommend that you inspect some of them to get a better understanding of their structure and content.

Environment variables

To customize runs, without the need of creating custom config-files, you can overwrite the config-parameters listed below by setting them as environment variables.

Type Description
MODEL_NAME str The name of the backbone model to load from the model registry. (e.g. pathology/kaiko_vitb8) facebookresearch/dino FM is evaluated
OUT_INDICES int | tuple[int] | None The indices of the feature maps to select. E.g. 1 outputs last feature map of the backbone, 3 outputs the last three feature maps, and (-2, -4) returns the penultimate and the forth before the last maps. Currently this is only used for segmentation tasks.
DATA_ROOT str The location of where the datasets will be downloaded to / loaded from during evaluation.
DOWNLOAD bool Whether to automatically download the dataset (make sure to review the license of the dataset first and note that not all datasets support this) .
OUTPUT_ROOT str The directory to store logging outputs and evaluation results
EMBEDDINGS_ROOT str The directory to store the computed embeddings during eva predict.
IN_FEATURES int The input feature dimension (embedding)
N_RUNS int Number of fit runs to perform in a session, defaults to 5
MAX_STEPS int Maximum number of training steps (if early stopping is not triggered)
BATCH_SIZE int Batch size for a training step
PREDICT_BATCH_SIZE int Batch size for a predict step
LR_VALUE float Learning rate for training the decoder
MONITOR_METRIC str The metric to monitor for early stopping and final model checkpoint loading
MONITOR_METRIC_MODE str "min" or "max", depending on the MONITOR_METRIC used
REPO_OR_DIR str GitHub repo with format containing model implementation, e.g. "facebookresearch/dino:main"