Skip to content

Oncology FM Evaluation Framework by kaiko.ai

eva currently supports performance evaluation for vision Foundation Models ("FMs") and supervised machine learning models on WSI (patch- and slide-level) as well as radiology image classification tasks.

With eva we provide the open-source community with an easy-to-use framework that follows industry best practices to deliver a robust, reproducible and fair evaluation benchmark across FMs of different sizes and architectures.

Support for additional modalities and tasks will be added soon.

Use cases

1. Evaluate your own FMs on public benchmark datasets

With a specified FM as input, you can run eva on several publicly available datasets & tasks. One evaluation run will download (if supported) and preprocess the relevant data, compute embeddings, fit and evaluate a downstream head and report the mean and standard deviation of the relevant performance metrics.

Supported datasets & tasks include:

WSI patch-level pathology datasets

  • Patch Camelyon: binary breast cancer classification
  • BACH: multiclass breast cancer classification
  • CRC: multiclass colorectal cancer classification
  • MHIST: binary colorectal polyp cancer classification
  • MoNuSAC: multi-organ nuclei segmentation
  • CoNSeP: segmentation colorectal nuclei and phenotypes

WSI slide-level pathology datasets

  • Camelyon16: binary breast cancer classification
  • PANDA: multiclass prostate cancer classification

Radiology datasets

  • TotalSegmentator: radiology/CT-scan for segmentation of anatomical structures
  • LiTS: radiology/CT-scan for segmentation of liver and tumor

To evaluate FMs, eva provides support for different model-formats, including models trained with PyTorch, models available on HuggingFace and ONNX-models. For other formats custom wrappers can be implemented.

2. Evaluate ML models on your own dataset & task

If you have your own labeled dataset, all that is needed is to implement a dataset class tailored to your source data. Start from one of our out-of-the box provided dataset classes, adapt it to your data and run eva to see how different FMs perform on your task.

Evaluation results

Check out our Leaderboards to inspect evaluation results of publicly available FMs.

License

eva is distributed under the terms of the Apache-2.0 license.

Next steps

Check out the User Guide to get started with eva