flowvision.layers¶

Plug and Play Modules or Functions that are specific for Computer Vision Tasks

flowvision.layers.blocks module¶

class flowvision.layers.blocks.FeaturePyramidNetwork(in_channels_list: List[int], out_channels: int, extra_blocks: Optional[flowvision.layers.blocks.feature_pyramid_network.ExtraFPNBlock] = None)[source]¶

Module that adds a FPN from on top of a set of feature maps. This is based on “Feature Pyramid Network for Object Detection”.

The feature maps are currently supposed to be increasing depth order.

The input to the model is expected to be an OrderedDict[Tensor], containing the feature maps on top of which the FPN will be added.

Parameters

in_channels_list (list[int]) – number of channels for each feature map that is passed to the module
out_channels (int) – number of channels of the FPN representation
extra_blocks (ExtraFPNBlock or None) – if provided, extra operations will be performed. It is expected to take the fpn features, the original features and the names of the original features as input, and returns a new list of feature maps and their corresponding names

forward(x: Dict[str, oneflow.Tensor]) → Dict[str, oneflow.Tensor][source]¶

Computes the FPN for a set of feature maps.

Parameters

x (OrderedDict[Tensor]) – feature maps for each feature level.

Returns

feature maps after FPN layers.: They are ordered from highest resolution first.

Return type

results (OrderedDict[Tensor])

get_result_from_inner_blocks(x: oneflow.Tensor, idx: int) → oneflow.Tensor[source]¶: This is equivalent to self.inner_blocks[idx](x)

get_result_from_layer_blocks(x: oneflow.Tensor, idx: int) → oneflow.Tensor[source]¶: This is equivalent to self.layer_blocks[idx](x)

class flowvision.layers.blocks.MultiScaleRoIAlign(featmap_names: List[str], output_size: Union[int, Tuple[int], List[int]], sampling_ratio: int, *, canonical_scale: int = 224, canonical_level: int = 4)[source]¶

Multi-scale RoIAlign pooling, which is useful for detection with or without FPN.

It infers the scale of the pooling via the heuristics specified in eq. 1 of the Feature Pyramid Network paper. They keyword-only parameters canonical_scale and canonical_level correspond respectively to 224 and k0=4 in eq. 1, and have the following meaning: canonical_level is the target level of the pyramid from which to pool a region of interest with w x h = canonical_scale x canonical_scale.

Parameters

featmap_names (List[str]) – the names of the feature maps that will be used for the pooling.
output_size (List[Tuple[int, int]] or List[int]) – output size for the pooled region
sampling_ratio (int) – sampling ratio for ROIAlign
canonical_scale (int, optional) – canonical_scale for LevelMapper
canonical_level (int, optional) – canonical_level for LevelMapper

forward(x: Dict[str, oneflow.Tensor], boxes: List[oneflow.Tensor], image_shapes: List[Tuple[int, int]]) → oneflow.Tensor[source]¶

Parameters

x (OrderedDict[Tensor]) – feature maps for each level. They are assumed to have all the same number of channels, but they can have different sizes.
boxes (List[Tensor[N, 4]]) – boxes to be used to perform the pooling operation, in (x1, y1, x2, y2) format and in the image reference size, not the feature map reference. The coordinate must satisfy 0 <= x1 < x2 and 0 <= y1 < y2.
image_shapes (List[Tuple[height, width]]) – the sizes of each image before they have been fed to a CNN to obtain feature maps. This allows us to infer the scale factor for each one of the levels to be pooled.

Returns

result (Tensor)

flowvision.layers.blocks.batched_nms(boxes: oneflow.Tensor, scores: oneflow.Tensor, idxs: oneflow.Tensor, iou_threshold: float) → oneflow.Tensor[source]¶

Performs non-maximum suppression in a batched fashion.

Each index value correspond to a category, and NMS will not be applied between elements of different categories.

Parameters

boxes (Tensor[N, 4]) – boxes where NMS will be performed. They are expected to be in (x1, y1, x2, y2) format with 0 <= x1 < x2 and 0 <= y1 < y2.
scores (Tensor[N]) – scores for each one of the boxes
idxs (Tensor[N]) – indices of the categories for each one of the boxes.
iou_threshold (float) – discards all overlapping boxes with IoU > iou_threshold

Returns

int64 tensor with the indices of the elements that have been kept by NMS, sorted in decreasing order of scores

Return type

Tensor

flowvision.layers.blocks.box_iou(boxes1: oneflow.Tensor, boxes2: oneflow.Tensor) → oneflow.Tensor[source]¶

Return intersection-over-union (Jaccard index) between two sets of boxes.

Both sets of boxes are expected to be in (x1, y1, x2, y2) format with 0 <= x1 < x2 and 0 <= y1 < y2.

Parameters

boxes1 (Tensor[N, 4]) – first set of boxes
boxes2 (Tensor[N, 4]) – second set of boxes

Returns

the NxM matrix containing the pairwise IoU values for every element in boxes 1 and boxes2

Return type

Tensor[N, M]

flowvision.layers.blocks.nms(boxes: oneflow.Tensor, scores: oneflow.Tensor, iou_threshold: float) → oneflow.Tensor[source]¶

Performs non-maximum suppression (NMS) on the boxes according to their intersection-over-union (IoU).

NMS iteratively removes lower scoring boxes which have an IoU greater than iou_threshold with another (higher scoring) box.

Parameters

boxes (Tensor[N, 4]) – boxes to perform NMS on. They are expected to be in (x1, y1, x2, y2) format with 0 <= x1 < x2 and 0 <= y1 < y2.
scores (Tensor[N]) – scores for each one of the boxes
iou_threshold (float) – discards all overlapping boxes with IoU > iou_threshold

Returns

int64 tensor with the indices of the elements that have been kept by NMS, sorted in decreasing order of scores

Return type

Tensor

flowvision.layers.attention module¶

class flowvision.layers.attention.SEModule(channels: int, reduction: int = 16, rd_channels: Optional[int] = None, act_layer: Optional[oneflow.nn.modules.activation.ReLU] = <class 'oneflow.nn.modules.activation.ReLU'>, gate_layer: Optional[oneflow.nn.modules.activation.Sigmoid] = <class 'oneflow.nn.modules.activation.Sigmoid'>, mlp_bias=True)[source]¶

“Squeeze-and-Excitation” block adaptively recalibrates channel-wise feature responses. This is based on “Squeeze-and-Excitation Networks”. This unit is designed to improve the representational capacity of a network by enabling it to perform dynamic channel-wise feature recalibration.

Parameters

channels (int) – The input channel size
reduction (int) – Ratio that allows us to vary the capacity and computational cost of the SE Module. Default: 16
rd_channels (int or None) – Number of reduced channels. If none, uses reduction to calculate
act_layer (Optional[ReLU]) – An activation layer used after the first FC layer. Default: flow.nn.ReLU
gate_layer (Optional[Sigmoid]) – An activation layer used after the second FC layer. Default: flow.nn.Sigmoid
mlp_bias (bool) – If True, add learnable bias to the linear layers. Default: True