Skip to content

Networks

Reference information for the vision model Networks API.

eva.vision.models.networks.ABMIL

Bases: Module

ABMIL network for multiple instance learning classification tasks.

Takes an array of patch level embeddings per slide as input. This implementation supports batched inputs of shape (batch_size, n_instances, input_size). For slides with less than n_instances patches, you can apply padding and provide a mask tensor to the forward pass.

The original implementation from [1] was used as a reference: https://github.com/AMLab-Amsterdam/AttentionDeepMIL/blob/master/model.py

Notes
  • use_bias: The paper didn't use bias in their formalism, but their published example code inadvertently does.
  • To prevent dot product similarities near-equal due to concentration of measure as a consequence of large input embedding dimensionality (>128), we added the option to project the input embeddings to a lower dimensionality

[1] Maximilian Ilse, Jakub M. Tomczak, Max Welling, "Attention-based Deep Multiple Instance Learning", 2018 https://arxiv.org/abs/1802.04712

Parameters:

Name Type Description Default
input_size int

input embedding dimension

required
output_size int

number of classes

required
projected_input_size int | None

size of the projected input. if None, no projection is performed.

required
hidden_size_attention int

hidden dimension in attention network

128
hidden_sizes_mlp tuple

dimensions for hidden layers in last mlp

(128, 64)
use_bias bool

whether to use bias in the attention network

True
dropout_input_embeddings float

dropout rate for the input embeddings

0.0
dropout_attention float

dropout rate for the attention network and classifier

0.0
dropout_mlp float

dropout rate for the final MLP network

0.0
pad_value int | float | None

Value indicating padding in the input tensor. If specified, entries with this value in the will be masked. If set to None, no masking is applied.

float('-inf')
Source code in src/eva/vision/models/networks/abmil.py
def __init__(
    self,
    input_size: int,
    output_size: int,
    projected_input_size: int | None,
    hidden_size_attention: int = 128,
    hidden_sizes_mlp: tuple = (128, 64),
    use_bias: bool = True,
    dropout_input_embeddings: float = 0.0,
    dropout_attention: float = 0.0,
    dropout_mlp: float = 0.0,
    pad_value: int | float | None = float("-inf"),
) -> None:
    """Initializes the ABMIL network.

    Args:
        input_size: input embedding dimension
        output_size: number of classes
        projected_input_size: size of the projected input. if `None`, no projection is
            performed.
        hidden_size_attention: hidden dimension in attention network
        hidden_sizes_mlp: dimensions for hidden layers in last mlp
        use_bias: whether to use bias in the attention network
        dropout_input_embeddings: dropout rate for the input embeddings
        dropout_attention: dropout rate for the attention network and classifier
        dropout_mlp: dropout rate for the final MLP network
        pad_value: Value indicating padding in the input tensor. If specified, entries with
            this value in the will be masked. If set to `None`, no masking is applied.
    """
    super().__init__()

    self._pad_value = pad_value

    if projected_input_size:
        self.projector = nn.Sequential(
            nn.Linear(input_size, projected_input_size, bias=True),
            nn.Dropout(p=dropout_input_embeddings),
        )
        input_size = projected_input_size
    else:
        self.projector = nn.Dropout(p=dropout_input_embeddings)

    self.gated_attention = GatedAttention(
        input_dim=input_size,
        hidden_dim=hidden_size_attention,
        dropout=dropout_attention,
        n_classes=1,
        use_bias=use_bias,
    )

    self.classifier = MLP(
        input_size=input_size,
        output_size=output_size,
        hidden_layer_sizes=hidden_sizes_mlp,
        dropout=dropout_mlp,
        hidden_activation_fn=nn.ReLU,
    )

forward

Forward pass.

Parameters:

Name Type Description Default
input_tensor Tensor

Tensor with expected shape of (batch_size, n_instances, input_size).

required
Source code in src/eva/vision/models/networks/abmil.py
def forward(self, input_tensor: torch.Tensor) -> torch.Tensor:
    """Forward pass.

    Args:
        input_tensor: Tensor with expected shape of (batch_size, n_instances, input_size).
    """
    input_tensor, mask = self._mask_values(input_tensor, self._pad_value)

    # (batch_size, n_instances, input_size) -> (batch_size, n_instances, projected_input_size)
    input_tensor = self.projector(input_tensor)

    attention_logits = self.gated_attention(input_tensor)  # (batch_size, n_instances, 1)
    if mask is not None:
        # fill masked values with -inf, which will yield 0s after softmax
        attention_logits = attention_logits.masked_fill(mask, float("-inf"))

    attention_weights = nn.functional.softmax(attention_logits, dim=1)
    # (batch_size, n_instances, 1)

    attention_result = torch.matmul(torch.transpose(attention_weights, 1, 2), input_tensor)
    # (batch_size, 1, hidden_size_attention)

    attention_result = torch.squeeze(attention_result, 1)  # (batch_size, hidden_size_attention)

    return self.classifier(attention_result)  # (batch_size, output_size)

eva.vision.models.networks.decoders.Decoder

Bases: Module

Semantic segmentation decoder base class.

eva.vision.models.networks.decoders.segmentation.ConvDecoder

Bases: Decoder

Convolutional segmentation decoder.

Here the input nn layers will be directly applied to the features of shape (batch_size, hidden_size, n_patches_height, n_patches_width), where n_patches is image_size / patch_size. Note the n_patches is also known as grid_size.

Parameters:

Name Type Description Default
layers Module

The convolutional layers to be used as the decoder head.

required
Source code in src/eva/vision/models/networks/decoders/segmentation/conv2d.py
def __init__(self, layers: nn.Module) -> None:
    """Initializes the convolutional based decoder head.

    Here the input nn layers will be directly applied to the
    features of shape (batch_size, hidden_size, n_patches_height,
    n_patches_width), where n_patches is image_size / patch_size.
    Note the n_patches is also known as grid_size.

    Args:
        layers: The convolutional layers to be used as the decoder head.
    """
    super().__init__()

    self._layers = layers

forward

Maps the patch embeddings to a segmentation mask of the image size.

Parameters:

Name Type Description Default
features List[Tensor]

List of multi-level image features of shape (batch_size, hidden_size, n_patches_height, n_patches_width).

required
image_size Tuple[int, int]

The target image size (height, width).

required

Returns:

Type Description
Tensor

Tensor containing scores for all of the classes with shape

Tensor

(batch_size, n_classes, image_height, image_width).

Source code in src/eva/vision/models/networks/decoders/segmentation/conv2d.py
def forward(
    self,
    features: List[torch.Tensor],
    image_size: Tuple[int, int],
) -> torch.Tensor:
    """Maps the patch embeddings to a segmentation mask of the image size.

    Args:
        features: List of multi-level image features of shape (batch_size,
            hidden_size, n_patches_height, n_patches_width).
        image_size: The target image size (height, width).

    Returns:
        Tensor containing scores for all of the classes with shape
        (batch_size, n_classes, image_height, image_width).
    """
    patch_embeddings = self._forward_features(features)
    logits = self._forward_head(patch_embeddings)
    return self._cls_seg(logits, image_size)

eva.vision.models.networks.decoders.segmentation.ConvDecoder1x1

Bases: ConvDecoder

A convolutional decoder with a single 1x1 convolutional layer.

Parameters:

Name Type Description Default
in_features int

The hidden dimension size of the embeddings.

required
num_classes int

Number of output classes as channels.

required
Source code in src/eva/vision/models/networks/decoders/segmentation/common.py
def __init__(self, in_features: int, num_classes: int) -> None:
    """Initializes the decoder.

    Args:
        in_features: The hidden dimension size of the embeddings.
        num_classes: Number of output classes as channels.
    """
    super().__init__(
        layers=nn.Conv2d(
            in_channels=in_features,
            out_channels=num_classes,
            kernel_size=(1, 1),
        ),
    )

eva.vision.models.networks.decoders.segmentation.ConvDecoderMS

Bases: ConvDecoder

A multi-stage convolutional decoder with upsampling and convolutional layers.

This decoder applies a series of upsampling and convolutional layers to transform the input features into output predictions with the desired spatial resolution.

This decoder is based on the +ms segmentation decoder from DINOv2 (https://arxiv.org/pdf/2304.07193)

Parameters:

Name Type Description Default
in_features int

The hidden dimension size of the embeddings.

required
num_classes int

Number of output classes as channels.

required
Source code in src/eva/vision/models/networks/decoders/segmentation/common.py
def __init__(self, in_features: int, num_classes: int) -> None:
    """Initializes the decoder.

    Args:
        in_features: The hidden dimension size of the embeddings.
        num_classes: Number of output classes as channels.
    """
    super().__init__(
        layers=nn.Sequential(
            nn.Upsample(scale_factor=2),
            nn.Conv2d(in_features, 64, kernel_size=(3, 3), padding=(1, 1)),
            nn.Upsample(scale_factor=2),
            nn.Conv2d(64, num_classes, kernel_size=(3, 3), padding=(1, 1)),
        ),
    )

eva.vision.models.networks.decoders.segmentation.LinearDecoder

Bases: Decoder

Linear decoder.

Here the input nn layers will be applied to the reshaped features (batch_size, patch_embeddings, hidden_size) from the input (batch_size, hidden_size, height, width) and then unwrapped again to (batch_size, n_classes, height, width).

Parameters:

Name Type Description Default
layers Module

The linear layers to be used as the decoder head.

required
Source code in src/eva/vision/models/networks/decoders/segmentation/linear.py
def __init__(self, layers: nn.Module) -> None:
    """Initializes the linear based decoder head.

    Here the input nn layers will be applied to the reshaped
    features (batch_size, patch_embeddings, hidden_size) from
    the input (batch_size, hidden_size, height, width) and then
    unwrapped again to (batch_size, n_classes, height, width).

    Args:
        layers: The linear layers to be used as the decoder head.
    """
    super().__init__()

    self._layers = layers

forward

Maps the patch embeddings to a segmentation mask of the image size.

Parameters:

Name Type Description Default
features List[Tensor]

List of multi-level image features of shape (batch_size, hidden_size, n_patches_height, n_patches_width).

required
image_size Tuple[int, int]

The target image size (height, width).

required

Returns:

Type Description
Tensor

Tensor containing scores for all of the classes with shape

Tensor

(batch_size, n_classes, image_height, image_width).

Source code in src/eva/vision/models/networks/decoders/segmentation/linear.py
def forward(
    self,
    features: List[torch.Tensor],
    image_size: Tuple[int, int],
) -> torch.Tensor:
    """Maps the patch embeddings to a segmentation mask of the image size.

    Args:
        features: List of multi-level image features of shape (batch_size,
            hidden_size, n_patches_height, n_patches_width).
        image_size: The target image size (height, width).

    Returns:
        Tensor containing scores for all of the classes with shape
        (batch_size, n_classes, image_height, image_width).
    """
    patch_embeddings = self._forward_features(features)
    logits = self._forward_head(patch_embeddings)
    return self._cls_seg(logits, image_size)

eva.vision.models.networks.decoders.segmentation.SingleLinearDecoder

Bases: LinearDecoder

A simple linear decoder with a single fully connected layer.

Parameters:

Name Type Description Default
in_features int

The hidden dimension size of the embeddings.

required
num_classes int

Number of output classes as channels.

required
Source code in src/eva/vision/models/networks/decoders/segmentation/common.py
def __init__(self, in_features: int, num_classes: int) -> None:
    """Initializes the decoder.

    Args:
        in_features: The hidden dimension size of the embeddings.
        num_classes: Number of output classes as channels.
    """
    super().__init__(
        layers=nn.Linear(
            in_features=in_features,
            out_features=num_classes,
        ),
    )