How to use eva
Before starting to use eva, it's important to get familiar with the different workflows, subcommands and configurations.
eva subcommands
To run an evaluation, we call:
The eva interface supports the subcommands: predict
, fit
and predict_fit
.
fit
: is used to train a decoder for a specific task and subsequently evaluate the performance. This can be done online or offline *predict
: is used to compute embeddings for input images with a provided FM-checkpoint. This is the first step of the offline workflowpredict_fit
: runspredict
andfit
sequentially. Like thefit
-online run, it runs a complete evaluation with images as input.
* online vs. offline workflows
We distinguish between the online and offline workflow:
- online: This mode uses raw images as input and generates the embeddings using a frozen FM backbone on the fly to train a downstream head network.
- offline: In this mode, embeddings are pre-computed and stored locally in a first step, and loaded in a 2nd step from disk to train the downstream head network.
The online workflow can be used to quickly run a complete evaluation without saving and tracking embeddings. The offline workflow runs faster (only one FM-backbone forward pass) and is ideal to experiment with different decoders on the same FM-backbone.
Run configurations
Config files
The setup for an eva run is provided in a .yaml
config file which is defined with the --config
flag.
A config file specifies the setup for the trainer (including callback for the model backbone), the model (setup of the trainable decoder) and data module.
The config files for the datasets and models that eva supports out of the box, you can find on GitHub. We recommend that you inspect some of them to get a better understanding of their structure and content.
Environment variables
To customize runs, without the need of creating custom config-files, you can overwrite the config-parameters listed below by setting them as environment variables.
Type | Description | |
---|---|---|
OUTPUT_ROOT |
str | The directory to store logging outputs and evaluation results |
EMBEDDINGS_ROOT |
str | The directory to store the computed embeddings |
CHECKPOINT_PATH |
str | Path to the FM-checkpoint to be evaluated |
IN_FEATURES |
int | The input feature dimension (embedding) |
NUM_CLASSES |
int | Number of classes for classification tasks |
N_RUNS |
int | Number of fit runs to perform in a session, defaults to 5 |
MAX_STEPS |
int | Maximum number of training steps (if early stopping is not triggered) |
BATCH_SIZE |
int | Batch size for a training step |
PREDICT_BATCH_SIZE |
int | Batch size for a predict step |
LR_VALUE |
float | Learning rate for training the decoder |
MONITOR_METRIC |
str | The metric to monitor for early stopping and final model checkpoint loading |
MONITOR_METRIC_MODE |
str | "min" or "max", depending on the MONITOR_METRIC used |
REPO_OR_DIR |
str | GitHub repo with format containing model implementation, e.g. "facebookresearch/dino:main" |
DINO_BACKBONE |
str | Backbone model architecture if a facebookresearch/dino FM is evaluated |
FORCE_RELOAD |
bool | Whether to force a fresh download of the github repo unconditionally |
PRETRAINED |
bool | Whether to load FM-backbone weights from a pretrained model |