Networks

Reference information for the vision model Networks API.

`eva.vision.models.networks.ABMIL`

Bases: Module

ABMIL network for multiple instance learning classification tasks.

Takes an array of patch level embeddings per slide as input. This implementation supports batched inputs of shape (batch_size, n_instances, input_size). For slides with less than n_instances patches, you can apply padding and provide a mask tensor to the forward pass.

The original implementation from [1] was used as a reference: https://github.com/AMLab-Amsterdam/AttentionDeepMIL/blob/master/model.py

Notes

use_bias: The paper didn't use bias in their formalism, but their published example code inadvertently does.
To prevent dot product similarities near-equal due to concentration of measure as a consequence of large input embedding dimensionality (>128), we added the option to project the input embeddings to a lower dimensionality

[1] Maximilian Ilse, Jakub M. Tomczak, Max Welling, "Attention-based Deep Multiple Instance Learning", 2018 https://arxiv.org/abs/1802.04712

Parameters:

Name	Type	Description	Default
`input_size`	`int`	input embedding dimension	required
`output_size`	`int`	number of classes	required
`projected_input_size`	`int \| None`	size of the projected input. if `None`, no projection is performed.	required
`hidden_size_attention`	`int`	hidden dimension in attention network	`128`
`hidden_sizes_mlp`	`tuple`	dimensions for hidden layers in last mlp	`(128, 64)`
`use_bias`	`bool`	whether to use bias in the attention network	`True`
`dropout_input_embeddings`	`float`	dropout rate for the input embeddings	`0.0`
`dropout_attention`	`float`	dropout rate for the attention network and classifier	`0.0`
`dropout_mlp`	`float`	dropout rate for the final MLP network	`0.0`
`pad_value`	`int \| float \| None`	Value indicating padding in the input tensor. If specified, entries with this value in the will be masked. If set to `None`, no masking is applied.	`float('-inf')`

Source code in src/eva/vision/models/networks/abmil.py

def __init__(
    self,
    input_size: int,
    output_size: int,
    projected_input_size: int | None,
    hidden_size_attention: int = 128,
    hidden_sizes_mlp: tuple = (128, 64),
    use_bias: bool = True,
    dropout_input_embeddings: float = 0.0,
    dropout_attention: float = 0.0,
    dropout_mlp: float = 0.0,
    pad_value: int | float | None = float("-inf"),
) -> None:
    """Initializes the ABMIL network.

    Args:
        input_size: input embedding dimension
        output_size: number of classes
        projected_input_size: size of the projected input. if `None`, no projection is
            performed.
        hidden_size_attention: hidden dimension in attention network
        hidden_sizes_mlp: dimensions for hidden layers in last mlp
        use_bias: whether to use bias in the attention network
        dropout_input_embeddings: dropout rate for the input embeddings
        dropout_attention: dropout rate for the attention network and classifier
        dropout_mlp: dropout rate for the final MLP network
        pad_value: Value indicating padding in the input tensor. If specified, entries with
            this value in the will be masked. If set to `None`, no masking is applied.
    """
    super().__init__()

    self._pad_value = pad_value

    if projected_input_size:
        self.projector = nn.Sequential(
            nn.Linear(input_size, projected_input_size, bias=True),
            nn.Dropout(p=dropout_input_embeddings),
        )
        input_size = projected_input_size
    else:
        self.projector = nn.Dropout(p=dropout_input_embeddings)

    self.gated_attention = GatedAttention(
        input_dim=input_size,
        hidden_dim=hidden_size_attention,
        dropout=dropout_attention,
        n_classes=1,
        use_bias=use_bias,
    )

    self.classifier = MLP(
        input_size=input_size,
        output_size=output_size,
        hidden_layer_sizes=hidden_sizes_mlp,
        dropout=dropout_mlp,
        hidden_activation_fn=nn.ReLU,
    )

`forward`

Forward pass.

Parameters:

Name	Type	Description	Default
`input_tensor`	`Tensor`	Tensor with expected shape of (batch_size, n_instances, input_size).	required

Source code in src/eva/vision/models/networks/abmil.py

def forward(self, input_tensor: torch.Tensor) -> torch.Tensor:
    """Forward pass.

    Args:
        input_tensor: Tensor with expected shape of (batch_size, n_instances, input_size).
    """
    input_tensor, mask = self._mask_values(input_tensor, self._pad_value)

    # (batch_size, n_instances, input_size) -> (batch_size, n_instances, projected_input_size)
    input_tensor = self.projector(input_tensor)

    attention_logits = self.gated_attention(input_tensor)  # (batch_size, n_instances, 1)
    if mask is not None:
        # fill masked values with -inf, which will yield 0s after softmax
        attention_logits = attention_logits.masked_fill(mask, float("-inf"))

    attention_weights = nn.functional.softmax(attention_logits, dim=1)
    # (batch_size, n_instances, 1)

    attention_result = torch.matmul(torch.transpose(attention_weights, 1, 2), input_tensor)
    # (batch_size, 1, hidden_size_attention)

    attention_result = torch.squeeze(attention_result, 1)  # (batch_size, hidden_size_attention)

    return self.classifier(attention_result)  # (batch_size, output_size)

`eva.vision.models.networks.decoders.Decoder`

Bases: Module, ABC

Abstract base class for segmentation decoders.

`forward` `abstractmethod`

Forward pass of the decoder.

Source code in src/eva/vision/models/networks/decoders/segmentation/base.py

@abc.abstractmethod
def forward(self, decoder_inputs: DecoderInputs) -> torch.Tensor:
    """Forward pass of the decoder."""

`eva.vision.models.networks.decoders.segmentation.Decoder2D`

Bases: Decoder

Segmentation decoder for 2D applications.

Here the input nn layers will be directly applied to the features of shape (batch_size, hidden_size, n_patches_height, n_patches_width), where n_patches is image_size / patch_size. Note the n_patches is also known as grid_size.

Parameters:

Name	Type	Description	Default
`layers`	`Module`	The layers to be used as the decoder head.	required
`combine_features`	`bool`	Whether to combine the features from different feature levels into one tensor before applying the decoder head.	`True`

Source code in src/eva/vision/models/networks/decoders/segmentation/decoder2d.py

def __init__(self, layers: nn.Module, combine_features: bool = True) -> None:
    """Initializes the based decoder head.

    Here the input nn layers will be directly applied to the
    features of shape (batch_size, hidden_size, n_patches_height,
    n_patches_width), where n_patches is image_size / patch_size.
    Note the n_patches is also known as grid_size.

    Args:
        layers: The layers to be used as the decoder head.
        combine_features: Whether to combine the features from different
            feature levels into one tensor before applying the decoder head.
    """
    super().__init__()

    self._layers = layers
    self._combine_features = combine_features

`forward`

Maps the patch embeddings to a segmentation mask of the image size.

Parameters:

Name	Type	Description	Default
`decoder_inputs`	`DecoderInputs`	Inputs required by the decoder.	required

Returns:

Type	Description
`Tensor`	Tensor containing scores for all of the classes with shape
`Tensor`	(batch_size, n_classes, image_height, image_width).

Source code in src/eva/vision/models/networks/decoders/segmentation/decoder2d.py

def forward(self, decoder_inputs: DecoderInputs) -> torch.Tensor:
    """Maps the patch embeddings to a segmentation mask of the image size.

    Args:
        decoder_inputs: Inputs required by the decoder.

    Returns:
        Tensor containing scores for all of the classes with shape
        (batch_size, n_classes, image_height, image_width).
    """
    features, image_size, _ = DecoderInputs(*decoder_inputs)
    if self._combine_features:
        features = self._forward_features(features)
    logits = self._forward_head(features)
    return self._upscale(logits, image_size)  # type: ignore

`eva.vision.models.networks.decoders.segmentation.ConvDecoder1x1`

Bases: Decoder2D

A convolutional decoder with a single 1x1 convolutional layer.

Parameters:

Name	Type	Description	Default
`in_features`	`int`	The hidden dimension size of the embeddings.	required
`num_classes`	`int`	Number of output classes as channels.	required

Source code in src/eva/vision/models/networks/decoders/segmentation/semantic/common.py

def __init__(self, in_features: int, num_classes: int) -> None:
    """Initializes the decoder.

    Args:
        in_features: The hidden dimension size of the embeddings.
        num_classes: Number of output classes as channels.
    """
    super().__init__(
        layers=nn.Conv2d(
            in_channels=in_features,
            out_channels=num_classes,
            kernel_size=(1, 1),
        ),
    )

`eva.vision.models.networks.decoders.segmentation.ConvDecoderMS`

Bases: Decoder2D

A multi-stage convolutional decoder with upsampling and convolutional layers.

This decoder applies a series of upsampling and convolutional layers to transform the input features into output predictions with the desired spatial resolution.

This decoder is based on the +ms segmentation decoder from DINOv2 (https://arxiv.org/pdf/2304.07193)

Parameters:

Name	Type	Description	Default
`in_features`	`int`	The hidden dimension size of the embeddings.	required
`num_classes`	`int`	Number of output classes as channels.	required

Source code in src/eva/vision/models/networks/decoders/segmentation/semantic/common.py

def __init__(self, in_features: int, num_classes: int) -> None:
    """Initializes the decoder.

    Args:
        in_features: The hidden dimension size of the embeddings.
        num_classes: Number of output classes as channels.
    """
    super().__init__(
        layers=nn.Sequential(
            nn.Upsample(scale_factor=2),
            nn.Conv2d(in_features, 64, kernel_size=(3, 3), padding=(1, 1)),
            nn.Upsample(scale_factor=2),
            nn.Conv2d(64, num_classes, kernel_size=(3, 3), padding=(1, 1)),
        ),
    )

`eva.vision.models.networks.decoders.segmentation.LinearDecoder`

Bases: Decoder

Linear decoder.

Here the input nn layers will be applied to the reshaped features (batch_size, patch_embeddings, hidden_size) from the input (batch_size, hidden_size, height, width) and then unwrapped again to (batch_size, n_classes, height, width).

Parameters:

Name	Type	Description	Default
`layers`	`Module`	The linear layers to be used as the decoder head.	required

Source code in src/eva/vision/models/networks/decoders/segmentation/linear.py

def __init__(self, layers: nn.Module) -> None:
    """Initializes the linear based decoder head.

    Here the input nn layers will be applied to the reshaped
    features (batch_size, patch_embeddings, hidden_size) from
    the input (batch_size, hidden_size, height, width) and then
    unwrapped again to (batch_size, n_classes, height, width).

    Args:
        layers: The linear layers to be used as the decoder head.
    """
    super().__init__()

    self._layers = layers

`forward`

Maps the patch embeddings to a segmentation mask of the image size.

Parameters:

Name	Type	Description	Default
`decoder_inputs`	`DecoderInputs`	Inputs required by the decoder.	required

Returns:

Type	Description
`Tensor`	Tensor containing scores for all of the classes with shape
`Tensor`	(batch_size, n_classes, image_height, image_width).

Source code in src/eva/vision/models/networks/decoders/segmentation/linear.py

def forward(self, decoder_inputs: DecoderInputs) -> torch.Tensor:
    """Maps the patch embeddings to a segmentation mask of the image size.

    Args:
        decoder_inputs: Inputs required by the decoder.

    Returns:
        Tensor containing scores for all of the classes with shape
        (batch_size, n_classes, image_height, image_width).
    """
    patch_embeddings = self._forward_features(decoder_inputs.features)
    logits = self._forward_head(patch_embeddings)
    return self._cls_seg(logits, decoder_inputs.image_size)  # type: ignore

`eva.vision.models.networks.decoders.segmentation.SingleLinearDecoder`

Bases: LinearDecoder

A simple linear decoder with a single fully connected layer.

Parameters:

Name	Type	Description	Default
`in_features`	`int`	The hidden dimension size of the embeddings.	required
`num_classes`	`int`	Number of output classes as channels.	required

Source code in src/eva/vision/models/networks/decoders/segmentation/semantic/common.py

def __init__(self, in_features: int, num_classes: int) -> None:
    """Initializes the decoder.

    Args:
        in_features: The hidden dimension size of the embeddings.
        num_classes: Number of output classes as channels.
    """
    super().__init__(
        layers=nn.Linear(
            in_features=in_features,
            out_features=num_classes,
        ),
    )

`eva.vision.models.networks.decoders.segmentation.ConvDecoderWithImage`

Bases: Decoder2D

A convolutional that in addition to encoded features, also takes the input image as input.

In a first stage, the input features are upsampled and passed through a convolutional layer, while in the second stage, the input image channels are concatenated with the upsampled features and passed through additional convolutional blocks in order to combine the image prior information with the encoded features. Lastly, a 1x1 conv operation reduces the number of channels to the number of classes.

Parameters:

Name	Type	Description	Default
`in_features`	`int`	The hidden dimension size of the embeddings.	required
`num_classes`	`int`	Number of output classes as channels.	required
`greyscale`	`bool`	Whether to convert input images to greyscale.	`False`
`hidden_dims`	`List[int] \| None`	List of hidden dimensions for the convolutional layers.	`None`

Source code in src/eva/vision/models/networks/decoders/segmentation/semantic/with_image.py

def __init__(
    self,
    in_features: int,
    num_classes: int,
    greyscale: bool = False,
    hidden_dims: List[int] | None = None,
) -> None:
    """Initializes the decoder.

    Args:
        in_features: The hidden dimension size of the embeddings.
        num_classes: Number of output classes as channels.
        greyscale: Whether to convert input images to greyscale.
        hidden_dims: List of hidden dimensions for the convolutional layers.
    """
    hidden_dims = hidden_dims or self._default_hidden_dims
    if len(hidden_dims) != 3:
        raise ValueError("Hidden dims must have 3 elements.")

    super().__init__(
        layers=nn.Sequential(
            nn.Upsample(scale_factor=2),
            Conv2dBnReLU(in_features, hidden_dims[0]),
        )
    )
    self.greyscale = greyscale

    additional_hidden_dims = 1 if greyscale else 3
    self.image_block = nn.Sequential(
        Conv2dBnReLU(hidden_dims[0] + additional_hidden_dims, hidden_dims[1]),
        Conv2dBnReLU(hidden_dims[1], hidden_dims[2]),
    )

    self.classifier = nn.Conv2d(hidden_dims[2], num_classes, kernel_size=1)

Networks

eva.vision.models.networks.ABMIL

forward

eva.vision.models.networks.decoders.Decoder

forward abstractmethod

eva.vision.models.networks.decoders.segmentation.Decoder2D

forward

eva.vision.models.networks.decoders.segmentation.ConvDecoder1x1

eva.vision.models.networks.decoders.segmentation.ConvDecoderMS

eva.vision.models.networks.decoders.segmentation.LinearDecoder

forward

eva.vision.models.networks.decoders.segmentation.SingleLinearDecoder

eva.vision.models.networks.decoders.segmentation.ConvDecoderWithImage

`eva.vision.models.networks.ABMIL`

`forward`

`eva.vision.models.networks.decoders.Decoder`

`forward` `abstractmethod`

`eva.vision.models.networks.decoders.segmentation.Decoder2D`

`forward`

`eva.vision.models.networks.decoders.segmentation.ConvDecoder1x1`

`eva.vision.models.networks.decoders.segmentation.ConvDecoderMS`

`eva.vision.models.networks.decoders.segmentation.LinearDecoder`

`forward`

`eva.vision.models.networks.decoders.segmentation.SingleLinearDecoder`

`eva.vision.models.networks.decoders.segmentation.ConvDecoderWithImage`