Networks

Reference information for the vision model Networks API.

`eva.vision.models.networks.ABMIL`

Bases: Module

ABMIL network for multiple instance learning classification tasks.

Takes an array of patch level embeddings per slide as input. This implementation supports batched inputs of shape (batch_size, n_instances, input_size). For slides with less than n_instances patches, you can apply padding and provide a mask tensor to the forward pass.

The original implementation from [1] was used as a reference: https://github.com/AMLab-Amsterdam/AttentionDeepMIL/blob/master/model.py

Notes

use_bias: The paper didn't use bias in their formalism, but their published example code inadvertently does.
To prevent dot product similarities near-equal due to concentration of measure as a consequence of large input embedding dimensionality (>128), we added the option to project the input embeddings to a lower dimensionality

[1] Maximilian Ilse, Jakub M. Tomczak, Max Welling, "Attention-based Deep Multiple Instance Learning", 2018 https://arxiv.org/abs/1802.04712

Parameters:

Name	Type	Description	Default
`input_size`	`int`	input embedding dimension	required
`output_size`	`int`	number of classes	required
`projected_input_size`	`int \| None`	size of the projected input. if `None`, no projection is performed.	required
`hidden_size_attention`	`int`	hidden dimension in attention network	`128`
`hidden_sizes_mlp`	`tuple`	dimensions for hidden layers in last mlp	`(128, 64)`
`use_bias`	`bool`	whether to use bias in the attention network	`True`
`dropout_input_embeddings`	`float`	dropout rate for the input embeddings	`0.0`
`dropout_attention`	`float`	dropout rate for the attention network and classifier	`0.0`
`dropout_mlp`	`float`	dropout rate for the final MLP network	`0.0`
`pad_value`	`int \| float \| None`	Value indicating padding in the input tensor. If specified, entries with this value in the will be masked. If set to `None`, no masking is applied.	`float('-inf')`

Source code in src/eva/vision/models/networks/abmil.py

def __init__(
    self,
    input_size: int,
    output_size: int,
    projected_input_size: int | None,
    hidden_size_attention: int = 128,
    hidden_sizes_mlp: tuple = (128, 64),
    use_bias: bool = True,
    dropout_input_embeddings: float = 0.0,
    dropout_attention: float = 0.0,
    dropout_mlp: float = 0.0,
    pad_value: int | float | None = float("-inf"),
) -> None:
    """Initializes the ABMIL network.

    Args:
        input_size: input embedding dimension
        output_size: number of classes
        projected_input_size: size of the projected input. if `None`, no projection is
            performed.
        hidden_size_attention: hidden dimension in attention network
        hidden_sizes_mlp: dimensions for hidden layers in last mlp
        use_bias: whether to use bias in the attention network
        dropout_input_embeddings: dropout rate for the input embeddings
        dropout_attention: dropout rate for the attention network and classifier
        dropout_mlp: dropout rate for the final MLP network
        pad_value: Value indicating padding in the input tensor. If specified, entries with
            this value in the will be masked. If set to `None`, no masking is applied.
    """
    super().__init__()

    self._pad_value = pad_value

    if projected_input_size:
        self.projector = nn.Sequential(
            nn.Linear(input_size, projected_input_size, bias=True),
            nn.Dropout(p=dropout_input_embeddings),
        )
        input_size = projected_input_size
    else:
        self.projector = nn.Dropout(p=dropout_input_embeddings)

    self.gated_attention = GatedAttention(
        input_dim=input_size,
        hidden_dim=hidden_size_attention,
        dropout=dropout_attention,
        n_classes=1,
        use_bias=use_bias,
    )

    self.classifier = MLP(
        input_size=input_size,
        output_size=output_size,
        hidden_layer_sizes=hidden_sizes_mlp,
        dropout=dropout_mlp,
        hidden_activation_fn=nn.ReLU,
    )

`forward`

Forward pass.

Parameters:

Name	Type	Description	Default
`input_tensor`	`Tensor`	Tensor with expected shape of (batch_size, n_instances, input_size).	required

Source code in src/eva/vision/models/networks/abmil.py

def forward(self, input_tensor: torch.Tensor) -> torch.Tensor:
    """Forward pass.

    Args:
        input_tensor: Tensor with expected shape of (batch_size, n_instances, input_size).
    """
    input_tensor, mask = self._mask_values(input_tensor, self._pad_value)

    # (batch_size, n_instances, input_size) -> (batch_size, n_instances, projected_input_size)
    input_tensor = self.projector(input_tensor)

    attention_logits = self.gated_attention(input_tensor)  # (batch_size, n_instances, 1)
    if mask is not None:
        # fill masked values with -inf, which will yield 0s after softmax
        attention_logits = attention_logits.masked_fill(mask, float("-inf"))

    attention_weights = nn.functional.softmax(attention_logits, dim=1)
    # (batch_size, n_instances, 1)

    attention_result = torch.matmul(torch.transpose(attention_weights, 1, 2), input_tensor)
    # (batch_size, 1, hidden_size_attention)

    attention_result = torch.squeeze(attention_result, 1)  # (batch_size, hidden_size_attention)

    return self.classifier(attention_result)  # (batch_size, output_size)

`eva.vision.models.networks.decoders.Decoder`

Bases: Module

Semantic segmentation decoder base class.

`eva.vision.models.networks.decoders.segmentation.ConvDecoder`

Bases: Decoder

Convolutional segmentation decoder.

Here the input nn layers will be directly applied to the features of shape (batch_size, hidden_size, n_patches_height, n_patches_width), where n_patches is image_size / patch_size. Note the n_patches is also known as grid_size.

Parameters:

Name	Type	Description	Default
`layers`	`Module`	The convolutional layers to be used as the decoder head.	required

Source code in src/eva/vision/models/networks/decoders/segmentation/conv2d.py

def __init__(self, layers: nn.Module) -> None:
    """Initializes the convolutional based decoder head.

    Here the input nn layers will be directly applied to the
    features of shape (batch_size, hidden_size, n_patches_height,
    n_patches_width), where n_patches is image_size / patch_size.
    Note the n_patches is also known as grid_size.

    Args:
        layers: The convolutional layers to be used as the decoder head.
    """
    super().__init__()

    self._layers = layers

`forward`

Maps the patch embeddings to a segmentation mask of the image size.

Parameters:

Name	Type	Description	Default
`features`	`List[Tensor]`	List of multi-level image features of shape (batch_size, hidden_size, n_patches_height, n_patches_width).	required
`image_size`	`Tuple[int, int]`	The target image size (height, width).	required

Returns:

Type	Description
`Tensor`	Tensor containing scores for all of the classes with shape
`Tensor`	(batch_size, n_classes, image_height, image_width).

Source code in src/eva/vision/models/networks/decoders/segmentation/conv2d.py

def forward(
    self,
    features: List[torch.Tensor],
    image_size: Tuple[int, int],
) -> torch.Tensor:
    """Maps the patch embeddings to a segmentation mask of the image size.

    Args:
        features: List of multi-level image features of shape (batch_size,
            hidden_size, n_patches_height, n_patches_width).
        image_size: The target image size (height, width).

    Returns:
        Tensor containing scores for all of the classes with shape
        (batch_size, n_classes, image_height, image_width).
    """
    patch_embeddings = self._forward_features(features)
    logits = self._forward_head(patch_embeddings)
    return self._cls_seg(logits, image_size)

`eva.vision.models.networks.decoders.segmentation.ConvDecoder1x1`

Bases: ConvDecoder

A convolutional decoder with a single 1x1 convolutional layer.

Parameters:

Name	Type	Description	Default
`in_features`	`int`	The hidden dimension size of the embeddings.	required
`num_classes`	`int`	Number of output classes as channels.	required

Source code in src/eva/vision/models/networks/decoders/segmentation/common.py

def __init__(self, in_features: int, num_classes: int) -> None:
    """Initializes the decoder.

    Args:
        in_features: The hidden dimension size of the embeddings.
        num_classes: Number of output classes as channels.
    """
    super().__init__(
        layers=nn.Conv2d(
            in_channels=in_features,
            out_channels=num_classes,
            kernel_size=(1, 1),
        ),
    )

`eva.vision.models.networks.decoders.segmentation.ConvDecoderMS`

Bases: ConvDecoder

A multi-stage convolutional decoder with upsampling and convolutional layers.

This decoder applies a series of upsampling and convolutional layers to transform the input features into output predictions with the desired spatial resolution.

This decoder is based on the +ms segmentation decoder from DINOv2 (https://arxiv.org/pdf/2304.07193)

Parameters:

Name	Type	Description	Default
`in_features`	`int`	The hidden dimension size of the embeddings.	required
`num_classes`	`int`	Number of output classes as channels.	required

Source code in src/eva/vision/models/networks/decoders/segmentation/common.py

def __init__(self, in_features: int, num_classes: int) -> None:
    """Initializes the decoder.

    Args:
        in_features: The hidden dimension size of the embeddings.
        num_classes: Number of output classes as channels.
    """
    super().__init__(
        layers=nn.Sequential(
            nn.Upsample(scale_factor=2),
            nn.Conv2d(in_features, 64, kernel_size=(3, 3), padding=(1, 1)),
            nn.Upsample(scale_factor=2),
            nn.Conv2d(64, num_classes, kernel_size=(3, 3), padding=(1, 1)),
        ),
    )

`eva.vision.models.networks.decoders.segmentation.LinearDecoder`

Bases: Decoder

Linear decoder.

Here the input nn layers will be applied to the reshaped features (batch_size, patch_embeddings, hidden_size) from the input (batch_size, hidden_size, height, width) and then unwrapped again to (batch_size, n_classes, height, width).

Parameters:

Name	Type	Description	Default
`layers`	`Module`	The linear layers to be used as the decoder head.	required

Source code in src/eva/vision/models/networks/decoders/segmentation/linear.py

def __init__(self, layers: nn.Module) -> None:
    """Initializes the linear based decoder head.

    Here the input nn layers will be applied to the reshaped
    features (batch_size, patch_embeddings, hidden_size) from
    the input (batch_size, hidden_size, height, width) and then
    unwrapped again to (batch_size, n_classes, height, width).

    Args:
        layers: The linear layers to be used as the decoder head.
    """
    super().__init__()

    self._layers = layers

`forward`

Maps the patch embeddings to a segmentation mask of the image size.

Parameters:

Name	Type	Description	Default
`features`	`List[Tensor]`	List of multi-level image features of shape (batch_size, hidden_size, n_patches_height, n_patches_width).	required
`image_size`	`Tuple[int, int]`	The target image size (height, width).	required

Returns:

Type	Description
`Tensor`	Tensor containing scores for all of the classes with shape
`Tensor`	(batch_size, n_classes, image_height, image_width).

Source code in src/eva/vision/models/networks/decoders/segmentation/linear.py

def forward(
    self,
    features: List[torch.Tensor],
    image_size: Tuple[int, int],
) -> torch.Tensor:
    """Maps the patch embeddings to a segmentation mask of the image size.

    Args:
        features: List of multi-level image features of shape (batch_size,
            hidden_size, n_patches_height, n_patches_width).
        image_size: The target image size (height, width).

    Returns:
        Tensor containing scores for all of the classes with shape
        (batch_size, n_classes, image_height, image_width).
    """
    patch_embeddings = self._forward_features(features)
    logits = self._forward_head(patch_embeddings)
    return self._cls_seg(logits, image_size)

`eva.vision.models.networks.decoders.segmentation.SingleLinearDecoder`

Bases: LinearDecoder

A simple linear decoder with a single fully connected layer.

Parameters:

Name	Type	Description	Default
`in_features`	`int`	The hidden dimension size of the embeddings.	required
`num_classes`	`int`	Number of output classes as channels.	required

Source code in src/eva/vision/models/networks/decoders/segmentation/common.py

def __init__(self, in_features: int, num_classes: int) -> None:
    """Initializes the decoder.

    Args:
        in_features: The hidden dimension size of the embeddings.
        num_classes: Number of output classes as channels.
    """
    super().__init__(
        layers=nn.Linear(
            in_features=in_features,
            out_features=num_classes,
        ),
    )