Networks
Reference information for the vision model Networks
API.
eva.vision.models.networks.ABMIL
Bases: Module
ABMIL network for multiple instance learning classification tasks.
Takes an array of patch level embeddings per slide as input. This implementation supports
batched inputs of shape (batch_size
, n_instances
, input_size
). For slides with less
than n_instances
patches, you can apply padding and provide a mask tensor to the forward
pass.
The original implementation from [1] was used as a reference: https://github.com/AMLab-Amsterdam/AttentionDeepMIL/blob/master/model.py
Notes
- use_bias: The paper didn't use bias in their formalism, but their published example code inadvertently does.
- To prevent dot product similarities near-equal due to concentration of measure as a consequence of large input embedding dimensionality (>128), we added the option to project the input embeddings to a lower dimensionality
[1] Maximilian Ilse, Jakub M. Tomczak, Max Welling, "Attention-based Deep Multiple Instance Learning", 2018 https://arxiv.org/abs/1802.04712
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_size |
int
|
input embedding dimension |
required |
output_size |
int
|
number of classes |
required |
projected_input_size |
int | None
|
size of the projected input. if |
required |
hidden_size_attention |
int
|
hidden dimension in attention network |
128
|
hidden_sizes_mlp |
tuple
|
dimensions for hidden layers in last mlp |
(128, 64)
|
use_bias |
bool
|
whether to use bias in the attention network |
True
|
dropout_input_embeddings |
float
|
dropout rate for the input embeddings |
0.0
|
dropout_attention |
float
|
dropout rate for the attention network and classifier |
0.0
|
dropout_mlp |
float
|
dropout rate for the final MLP network |
0.0
|
pad_value |
int | float | None
|
Value indicating padding in the input tensor. If specified, entries with
this value in the will be masked. If set to |
float('-inf')
|
Source code in src/eva/vision/models/networks/abmil.py
forward
Forward pass.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_tensor |
Tensor
|
Tensor with expected shape of (batch_size, n_instances, input_size). |
required |
Source code in src/eva/vision/models/networks/abmil.py
eva.vision.models.networks.decoders.Decoder
Bases: Module
Semantic segmentation decoder base class.
eva.vision.models.networks.decoders.segmentation.ConvDecoder
Bases: Decoder
Convolutional segmentation decoder.
Here the input nn layers will be directly applied to the features of shape (batch_size, hidden_size, n_patches_height, n_patches_width), where n_patches is image_size / patch_size. Note the n_patches is also known as grid_size.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
layers |
Module
|
The convolutional layers to be used as the decoder head. |
required |
Source code in src/eva/vision/models/networks/decoders/segmentation/conv2d.py
forward
Maps the patch embeddings to a segmentation mask of the image size.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features |
List[Tensor]
|
List of multi-level image features of shape (batch_size, hidden_size, n_patches_height, n_patches_width). |
required |
image_size |
Tuple[int, int]
|
The target image size (height, width). |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Tensor containing scores for all of the classes with shape |
Tensor
|
(batch_size, n_classes, image_height, image_width). |
Source code in src/eva/vision/models/networks/decoders/segmentation/conv2d.py
eva.vision.models.networks.decoders.segmentation.ConvDecoder1x1
Bases: ConvDecoder
A convolutional decoder with a single 1x1 convolutional layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_features |
int
|
The hidden dimension size of the embeddings. |
required |
num_classes |
int
|
Number of output classes as channels. |
required |
Source code in src/eva/vision/models/networks/decoders/segmentation/common.py
eva.vision.models.networks.decoders.segmentation.ConvDecoderMS
Bases: ConvDecoder
A multi-stage convolutional decoder with upsampling and convolutional layers.
This decoder applies a series of upsampling and convolutional layers to transform the input features into output predictions with the desired spatial resolution.
This decoder is based on the +ms
segmentation decoder from DINOv2
(https://arxiv.org/pdf/2304.07193)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_features |
int
|
The hidden dimension size of the embeddings. |
required |
num_classes |
int
|
Number of output classes as channels. |
required |
Source code in src/eva/vision/models/networks/decoders/segmentation/common.py
eva.vision.models.networks.decoders.segmentation.LinearDecoder
Bases: Decoder
Linear decoder.
Here the input nn layers will be applied to the reshaped features (batch_size, patch_embeddings, hidden_size) from the input (batch_size, hidden_size, height, width) and then unwrapped again to (batch_size, n_classes, height, width).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
layers |
Module
|
The linear layers to be used as the decoder head. |
required |
Source code in src/eva/vision/models/networks/decoders/segmentation/linear.py
forward
Maps the patch embeddings to a segmentation mask of the image size.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features |
List[Tensor]
|
List of multi-level image features of shape (batch_size, hidden_size, n_patches_height, n_patches_width). |
required |
image_size |
Tuple[int, int]
|
The target image size (height, width). |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Tensor containing scores for all of the classes with shape |
Tensor
|
(batch_size, n_classes, image_height, image_width). |
Source code in src/eva/vision/models/networks/decoders/segmentation/linear.py
eva.vision.models.networks.decoders.segmentation.SingleLinearDecoder
Bases: LinearDecoder
A simple linear decoder with a single fully connected layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_features |
int
|
The hidden dimension size of the embeddings. |
required |
num_classes |
int
|
Number of output classes as channels. |
required |