Networks
Reference information for the vision model Networks
API.
eva.vision.models.networks.ABMIL
Bases: Module
ABMIL network for multiple instance learning classification tasks.
Takes an array of patch level embeddings per slide as input. This implementation supports
batched inputs of shape (batch_size
, n_instances
, input_size
). For slides with less
than n_instances
patches, you can apply padding and provide a mask tensor to the forward
pass.
The original implementation from [1] was used as a reference: https://github.com/AMLab-Amsterdam/AttentionDeepMIL/blob/master/model.py
Notes
- use_bias: The paper didn't use bias in their formalism, but their published example code inadvertently does.
- To prevent dot product similarities near-equal due to concentration of measure as a consequence of large input embedding dimensionality (>128), we added the option to project the input embeddings to a lower dimensionality
[1] Maximilian Ilse, Jakub M. Tomczak, Max Welling, "Attention-based Deep Multiple Instance Learning", 2018 https://arxiv.org/abs/1802.04712
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_size |
int
|
input embedding dimension |
required |
output_size |
int
|
number of classes |
required |
projected_input_size |
int | None
|
size of the projected input. if |
required |
hidden_size_attention |
int
|
hidden dimension in attention network |
128
|
hidden_sizes_mlp |
tuple
|
dimensions for hidden layers in last mlp |
(128, 64)
|
use_bias |
bool
|
whether to use bias in the attention network |
True
|
dropout_input_embeddings |
float
|
dropout rate for the input embeddings |
0.0
|
dropout_attention |
float
|
dropout rate for the attention network and classifier |
0.0
|
dropout_mlp |
float
|
dropout rate for the final MLP network |
0.0
|
pad_value |
int | float | None
|
Value indicating padding in the input tensor. If specified, entries with
this value in the will be masked. If set to |
float('-inf')
|
Source code in src/eva/vision/models/networks/abmil.py
forward
Forward pass.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_tensor |
Tensor
|
Tensor with expected shape of (batch_size, n_instances, input_size). |
required |
Source code in src/eva/vision/models/networks/abmil.py
eva.vision.models.networks.decoders.Decoder
Bases: Module
, ABC
Abstract base class for segmentation decoders.
eva.vision.models.networks.decoders.segmentation.Decoder2D
Bases: Decoder
Segmentation decoder for 2D applications.
Here the input nn layers will be directly applied to the features of shape (batch_size, hidden_size, n_patches_height, n_patches_width), where n_patches is image_size / patch_size. Note the n_patches is also known as grid_size.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
layers |
Module
|
The layers to be used as the decoder head. |
required |
combine_features |
bool
|
Whether to combine the features from different feature levels into one tensor before applying the decoder head. |
True
|
Source code in src/eva/vision/models/networks/decoders/segmentation/decoder2d.py
forward
Maps the patch embeddings to a segmentation mask of the image size.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
decoder_inputs |
DecoderInputs
|
Inputs required by the decoder. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Tensor containing scores for all of the classes with shape |
Tensor
|
(batch_size, n_classes, image_height, image_width). |
Source code in src/eva/vision/models/networks/decoders/segmentation/decoder2d.py
eva.vision.models.networks.decoders.segmentation.ConvDecoder1x1
Bases: Decoder2D
A convolutional decoder with a single 1x1 convolutional layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_features |
int
|
The hidden dimension size of the embeddings. |
required |
num_classes |
int
|
Number of output classes as channels. |
required |
Source code in src/eva/vision/models/networks/decoders/segmentation/semantic/common.py
eva.vision.models.networks.decoders.segmentation.ConvDecoderMS
Bases: Decoder2D
A multi-stage convolutional decoder with upsampling and convolutional layers.
This decoder applies a series of upsampling and convolutional layers to transform the input features into output predictions with the desired spatial resolution.
This decoder is based on the +ms
segmentation decoder from DINOv2
(https://arxiv.org/pdf/2304.07193)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_features |
int
|
The hidden dimension size of the embeddings. |
required |
num_classes |
int
|
Number of output classes as channels. |
required |
Source code in src/eva/vision/models/networks/decoders/segmentation/semantic/common.py
eva.vision.models.networks.decoders.segmentation.LinearDecoder
Bases: Decoder
Linear decoder.
Here the input nn layers will be applied to the reshaped features (batch_size, patch_embeddings, hidden_size) from the input (batch_size, hidden_size, height, width) and then unwrapped again to (batch_size, n_classes, height, width).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
layers |
Module
|
The linear layers to be used as the decoder head. |
required |
Source code in src/eva/vision/models/networks/decoders/segmentation/linear.py
forward
Maps the patch embeddings to a segmentation mask of the image size.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features |
List[Tensor]
|
List of multi-level image features of shape (batch_size, hidden_size, n_patches_height, n_patches_width). |
required |
image_size |
Tuple[int, int]
|
The target image size (height, width). |
required |
Returns:
Type | Description |
---|---|
Tensor
|
Tensor containing scores for all of the classes with shape |
Tensor
|
(batch_size, n_classes, image_height, image_width). |
Source code in src/eva/vision/models/networks/decoders/segmentation/linear.py
eva.vision.models.networks.decoders.segmentation.SingleLinearDecoder
Bases: LinearDecoder
A simple linear decoder with a single fully connected layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_features |
int
|
The hidden dimension size of the embeddings. |
required |
num_classes |
int
|
Number of output classes as channels. |
required |
Source code in src/eva/vision/models/networks/decoders/segmentation/semantic/common.py
eva.vision.models.networks.decoders.segmentation.ConvDecoderWithImage
Bases: Decoder2D
A convolutional that in addition to encoded features, also takes the input image as input.
In a first stage, the input features are upsampled and passed through a convolutional layer, while in the second stage, the input image channels are concatenated with the upsampled features and passed through additional convolutional blocks in order to combine the image prior information with the encoded features. Lastly, a 1x1 conv operation reduces the number of channels to the number of classes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_features |
int
|
The hidden dimension size of the embeddings. |
required |
num_classes |
int
|
Number of output classes as channels. |
required |
greyscale |
bool
|
Whether to convert input images to greyscale. |
False
|
hidden_dims |
List[int] | None
|
List of hidden dimensions for the convolutional layers. |
None
|