flowvision.transforms¶

Utils for Image Transforms¶

class flowvision.transforms.CenterCrop(size)[source]¶

Crops the given image at the center. If the image is oneflow Tensor, it is expected to have […, H, W] shape, where … means an arbitrary number of leading dimensions. If image size is smaller than output size along any edge, image is padded with 0 and then center cropped.

Parameters: size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).

forward(img)[source]¶

Parameters: img (PIL Image or Tensor) – Image to be cropped.
Returns: Cropped image.
Return type: PIL Image or Tensor

class flowvision.transforms.Compose(transforms)[source]¶

Composes several transforms together. Please, see the note below. :param transforms: list of transforms to compose. :type transforms: list of Transform objects

Example

>>> transforms.Compose([
>>>     transforms.CenterCrop(10),
>>>     transforms.ToTensor(),
>>> ])

Note

In order to script the transformations, please use flow.nn.Sequential as below. >>> transforms = flow.nn.Sequential( >>> transforms.CenterCrop(10), >>> transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)), >>> ) Make sure to use only scriptable transformations, i.e. that work with flow.Tensor, does not require lambda functions or PIL.Image.

class flowvision.transforms.ConvertImageDtype(dtype: oneflow._oneflow_internal.dtype)[source]¶

Convert a tensor image to the given dtype and scale the values accordingly This function does not support PIL Image.

Parameters: dtype (flow.dtype) – Desired data type of the output

Note

When converting from a smaller to a larger integer dtype the maximum values are not mapped exactly. If converted back and forth, this mismatch has no effect.

Raises: RuntimeError – When trying to cast flow.float32 to flow.int32 or flow.int64 as well as for trying to cast flow.float64 to flow.int64. These conversions might lead to overflow errors since the floating point dtype cannot store consecutive integers over the whole range of the integer dtype.

class flowvision.transforms.FiveCrop(size)[source]¶

Crop the given image into four corners and the central crop. If the image is flow Tensor, it is expected to have […, H, W] shape, where … means an arbitrary number of leading dimensions

Note

This transform returns a tuple of images and there may be a mismatch in the number of inputs and targets your Dataset returns. See below for an example of how to deal with this.

Parameters: size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop of size (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).

Example

>>> transform = Compose([
>>>    FiveCrop(size), # this is a list of PIL Images
>>>    Lambda(lambda crops: flow.stack([ToTensor()(crop) for crop in crops])) # returns a 4D tensor
>>> ])
>>> #In your test loop you can do the following:
>>> input, target = batch # input is a 5d tensor, target is 2d
>>> bs, ncrops, c, h, w = input.size()
>>> result = model(input.view(-1, c, h, w)) # fuse batch size and ncrops
>>> result_avg = result.view(bs, ncrops, -1).mean(1) # avg over crops

forward(img)[source]¶

Parameters: img (PIL Image or Tensor) – Image to be cropped.
Returns: tuple of 5 images. Image can be PIL Image or Tensor

class flowvision.transforms.GaussianBlur(kernel_size, sigma=(0.1, 2.0))[source]¶

Blurs image with randomly chosen Gaussian blur. If the image is oneflow Tensor, it is expected to have […, C, H, W] shape, where … means an arbitrary number of leading dimensions.

Parameters

kernel_size (int or sequence) – Size of the Gaussian kernel.
sigma (float or tuple of float (min, max)) – Standard deviation to be used for creating kernel to perform blurring. If float, sigma is fixed. If it is tuple of float (min, max), sigma is chosen uniformly at random to lie in the given range.

Returns

Gaussian blurred version of the input image.

Return type

PIL Image or Tensor

forward(img: oneflow.Tensor) → oneflow.Tensor[source]¶

Parameters: img (PIL Image or Tensor) – image to be blurred.
Returns: Gaussian blurred image
Return type: PIL Image or Tensor

static get_params(sigma_min: float, sigma_max: float) → float[source]¶

Choose sigma for random gaussian blurring.

Parameters

sigma_min (float) – Minimum standard deviation that can be chosen for blurring kernel.
sigma_max (float) – Maximum standard deviation that can be chosen for blurring kernel.

Returns

Standard deviation to be passed to calculate kernel for gaussian blurring.

Return type

float

class flowvision.transforms.Grayscale(num_output_channels=1)[source]¶

Convert image to grayscale. If the image is oneflow Tensor, it is expected to have […, 3, H, W] shape, where … means an arbitrary number of leading dimensions

Parameters: num_output_channels (int) – (1 or 3) number of channels desired for output image
Returns: Grayscale version of the input. - If num_output_channels == 1 : returned image is single channel - If num_output_channels == 3 : returned image is 3 channel with r == g == b
Return type: PIL Image

forward(img)[source]¶

Parameters: img (PIL Image or Tensor) – Image to be converted to grayscale.
Returns: Grayscaled image.
Return type: PIL Image or Tensor

class flowvision.transforms.InterpolationMode(value)[source]¶: Interpolation modes

class flowvision.transforms.Lambda(lambd)[source]¶

Apply a user-defined lambda as a transform.

Parameters: lambd (function) – Lambda/function to be used for transform.

class flowvision.transforms.Normalize(mean, std, inplace=False)[source]¶

Normalize a tensor image with mean and standard deviation. This transform does not support PIL Image. Given mean: (mean[1],...,mean[n]) and std: (std[1],..,std[n]) for n channels, this transform will normalize each channel of the input flow.*Tensor i.e., output[channel] = (input[channel] - mean[channel]) / std[channel]

Note

This transform acts out of place, i.e., it does not mutate the input tensor.

Parameters

mean (sequence) – Sequence of means for each channel.
std (sequence) – Sequence of standard deviations for each channel.
inplace (bool,optional) – Bool to make this operation in-place.

forward(tensor: oneflow.Tensor) → oneflow.Tensor[source]¶

Parameters: tensor (Tensor) – Tensor image to be normalized.
Returns: Normalized Tensor image.
Return type: Tensor

class flowvision.transforms.PILToTensor[source]¶

Convert a PIL Image to a tensor of the same type

Converts a PIL Image (H x W x C) to a Tensor of shape (C x H x W).

class flowvision.transforms.Pad(padding, fill=0, padding_mode='constant')[source]¶

Pad the given image on all sides with the given “pad” value. If the image is oneflow Tensor, it is expected to have […, H, W] shape, where … means at most 2 leading dimensions for mode reflect and symmetric, at most 3 leading dimensions for mode edge, and an arbitrary number of leading dimensions for mode constant

Parameters

padding (int or sequence) – Padding on each border. If a single int is provided this is used to pad all borders. If sequence of length 2 is provided this is the padding on left/right and top/bottom respectively. If a sequence of length 4 is provided this is the padding for the left, top, right and bottom borders respectively.
fill (number or str or tuple) – Pixel fill value for constant fill. Default is 0. If a tuple of length 3, it is used to fill R, G, B channels respectively. This value is only used when the padding_mode is constant. Only number is supported for oneflow Tensor. Only int or str or tuple value is supported for PIL Image.
padding_mode (str) –
Type of padding. Should be: constant, edge, reflect or symmetric. Default is constant.
- constant: pads with a constant value, this value is specified with fill
- edge: pads with the last value at the edge of the image. If input a 5D oneflow Tensor, the last 3 dimensions will be padded instead of the last 2
- reflect: pads with reflection of image without repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2]
- symmetric: pads with reflection of image repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3]

forward(img)[source]¶

Parameters: img (PIL Image or Tensor) – Image to be padded.
Returns: Padded image.
Return type: PIL Image or Tensor

class flowvision.transforms.RandomApply(transforms, p=0.5)[source]¶

Apply randomly a list of transformations with a given probability.

Note

In order to script the transformation, please use flow.nn.ModuleList as input instead of list/tuple of transforms as shown below:

>>> transforms = transforms.RandomApply(flow.nn.ModuleList([
>>>     transforms.ColorJitter(),
>>> ]), p=0.3)

Make sure to use only scriptable transformations, i.e. that work with flow.Tensor, does not require lambda functions or PIL.Image.

Parameters

transforms (sequence or Module) – list of transformations
p (float) – probability

class flowvision.transforms.RandomChoice(transforms)[source]¶: Apply single transformation randomly picked from a list.

class flowvision.transforms.RandomCrop(size, padding=None, pad_if_needed=False, fill=0, padding_mode='constant')[source]¶

Crop the given image at a random location. If the image is oneflow Tensor, it is expected to have […, H, W] shape, where … means an arbitrary number of leading dimensions, but if non-constant padding is used, the input is expected to have at most 2 leading dimensions

Parameters

size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).
padding (int or sequence, optional) – Optional padding on each border of the image. Default is None. If a single int is provided this is used to pad all borders. If sequence of length 2 is provided this is the padding on left/right and top/bottom respectively. If a sequence of length 4 is provided this is the padding for the left, top, right and bottom borders respectively.
pad_if_needed (boolean) – It will pad the image if smaller than the desired size to avoid raising an exception. Since cropping is done after padding, the padding seems to be done at a random offset.
fill (number or str or tuple) – Pixel fill value for constant fill. Default is 0. If a tuple of length 3, it is used to fill R, G, B channels respectively. This value is only used when the padding_mode is constant. Only number is supported for flow Tensor. Only int or str or tuple value is supported for PIL Image.
padding_mode (str) –
Type of padding. Should be: constant, edge, reflect or symmetric. Default is constant.
- constant: pads with a constant value, this value is specified with fill
- edge: pads with the last value at the edge of the image. If input a 5D flow Tensor, the last 3 dimensions will be padded instead of the last 2
- reflect: pads with reflection of image without repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2]
- symmetric: pads with reflection of image repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3]

forward(img)[source]¶

Parameters: img (PIL Image or Tensor) – Image to be cropped.
Returns: Cropped image.
Return type: PIL Image or Tensor

static get_params(img: oneflow.Tensor, output_size: Tuple[int, int]) → Tuple[int, int, int, int][source]¶

Get parameters for crop for a random crop.

Parameters

img (PIL Image or Tensor) – Image to be cropped.
output_size (tuple) – Expected output size of the crop.

Returns

params (i, j, h, w) to be passed to crop for random crop.

Return type

tuple

class flowvision.transforms.RandomGrayscale(p=0.1)[source]¶

Randomly convert image to grayscale with a probability of p (default 0.1). If the image is flow Tensor, it is expected to have […, 3, H, W] shape, where … means an arbitrary number of leading dimensions

Parameters: p (float) – probability that image should be converted to grayscale.
Returns: Grayscale version of the input image with probability p and unchanged with probability (1-p). - If input image is 1 channel: grayscale version is 1 channel - If input image is 3 channel: grayscale version is 3 channel with r == g == b
Return type: PIL Image or Tensor

forward(img)[source]¶

Parameters: img (PIL Image or Tensor) – Image to be converted to grayscale.
Returns: Randomly grayscaled image.
Return type: PIL Image or Tensor

class flowvision.transforms.RandomHorizontalFlip(p=0.5)[source]¶

Horizontally flip the given image randomly with a given probability. If the image is flow Tensor, it is expected to have […, H, W] shape, where … means an arbitrary number of leading dimensions

Parameters: p (float) – probability of the image being flipped. Default value is 0.5

forward(img)[source]¶

Parameters: img (PIL Image or Tensor) – Image to be flipped.
Returns: Randomly flipped image.
Return type: PIL Image or Tensor

class flowvision.transforms.RandomOrder(transforms)[source]¶: Apply a list of transformations in a random order.

class flowvision.transforms.RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=<InterpolationMode.BILINEAR: 'bilinear'>)[source]¶

Crop a random portion of image and resize it to a given size.

If the image is flow Tensor, it is expected to have […, H, W] shape, where … means an arbitrary number of leading dimensions

A crop of the original image is made: the crop has a random area (H * W) and a random aspect ratio. This crop is finally resized to the given size. This is popularly used to train the Inception networks.

Parameters

size (int or sequence) – expected output size of the crop, for each edge. If size is an int instead of sequence like (h, w), a square output size (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).
scale (tuple of float) – Specifies the lower and upper bounds for the random area of the crop, before resizing. The scale is defined with respect to the area of the original image.
ratio (tuple of float) – lower and upper bounds for the random aspect ratio of the crop, before resizing.
interpolation (InterpolationMode) – Desired interpolation enum defined by flowvision.transforms.InterpolationMode. Default is InterpolationMode.BILINEAR. If input is Tensor, only InterpolationMode.NEAREST, InterpolationMode.BILINEAR and InterpolationMode.BICUBIC are supported. For backward compatibility integer values (e.g. PIL.Image.NEAREST) are still acceptable.

forward(img)[source]¶

Parameters: img (PIL Image or Tensor) – Image to be cropped and resized.
Returns: Randomly cropped and resized image.
Return type: PIL Image or Tensor

static get_params(img: oneflow.Tensor, scale: List[float], ratio: List[float]) → Tuple[int, int, int, int][source]¶

Get parameters for crop for a random sized crop.

Parameters

img (PIL Image or Tensor) – Input image.
scale (list) – range of scale of the origin size cropped
ratio (list) – range of aspect ratio of the origin aspect ratio cropped

Returns

params (i, j, h, w) to be passed to crop for a random sized crop.

Return type

tuple

class flowvision.transforms.RandomSizedCrop(*args, **kwargs)[source]¶: Note: This transform is deprecated in favor of RandomResizedCrop.

class flowvision.transforms.RandomTransforms(transforms)[source]¶

Base class for a list of transformations with randomness

Parameters: transforms (sequence) – list of transformations

class flowvision.transforms.RandomVerticalFlip(p=0.5)[source]¶

Vertically flip the given image randomly with a given probability. If the image is flow Tensor, it is expected to have […, H, W] shape, where … means an arbitrary number of leading dimensions

Parameters: p (float) – probability of the image being flipped. Default value is 0.5

forward(img)[source]¶

Parameters: img (PIL Image or Tensor) – Image to be flipped.
Returns: Randomly flipped image.
Return type: PIL Image or Tensor

class flowvision.transforms.Resize(size, interpolation=<InterpolationMode.BILINEAR: 'bilinear'>)[source]¶

Resize the input image to the given size. If the image is oneflow Tensor, it is expected to have […, H, W] shape, where … means an arbitrary number of leading dimensions

Parameters

size (sequence or int) – Desired output size. If size is a sequence like (h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size).
interpolation (InterpolationMode) – Desired interpolation enum defined by flowvision.transforms.InterpolationMode. Default is InterpolationMode.BILINEAR. If input is Tensor, only InterpolationMode.NEAREST, InterpolationMode.BILINEAR and InterpolationMode.BICUBIC are supported. For backward compatibility integer values (e.g. PIL.Image.NEAREST) are still acceptable.

forward(img)[source]¶

Parameters: img (PIL Image or Tensor) – Image to be scaled.
Returns: Rescaled image.
Return type: PIL Image or Tensor

class flowvision.transforms.Scale(*args, **kwargs)[source]¶: Note: This transform is deprecated in favor of Resize.

class flowvision.transforms.Solarization(p=0.1)[source]¶

Apply Solarization to the input PIL Image.

Parameters: p (float) – probability that image should be applied with solarization operation.

forward(img)[source]¶

Parameters: img (PIL Image or Tensor) – Image to be applied with solarization operation.
Returns: selorization image.
Return type: PIL Image or Tensor

class flowvision.transforms.TenCrop(size, vertical_flip=False)[source]¶

Crop the given image into four corners and the central crop plus the flipped version of these (horizontal flipping is used by default). If the image is flow Tensor, it is expected to have […, H, W] shape, where … means an arbitrary number of leading dimensions

Note

This transform returns a tuple of images and there may be a mismatch in the number of inputs and targets your Dataset returns. See below for an example of how to deal with this.

Parameters

size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).
vertical_flip (bool) – Use vertical flipping instead of horizontal

Example

>>> transform = Compose([
>>>    TenCrop(size), # this is a list of PIL Images
>>>    Lambda(lambda crops: flow.stack([ToTensor()(crop) for crop in crops])) # returns a 4D tensor
>>> ])
>>> #In your test loop you can do the following:
>>> input, target = batch # input is a 5d tensor, target is 2d
>>> bs, ncrops, c, h, w = input.size()
>>> result = model(input.view(-1, c, h, w)) # fuse batch size and ncrops
>>> result_avg = result.view(bs, ncrops, -1).mean(1) # avg over crops

forward(img)[source]¶

Parameters: img (PIL Image or Tensor) – Image to be cropped.
Returns: tuple of 10 images. Image can be PIL Image or Tensor

class flowvision.transforms.ToPILImage(mode=None)[source]¶

Convert a tensor or an ndarray to PIL Image.

Converts a flow.Tensor of shape C x H x W or a numpy ndarray of shape H x W x C to a PIL Image while preserving the value range.

Parameters: mode (PIL.Image mode) – color space and pixel depth of input data (optional). If mode is None (default) there are some assumptions made about the input data: - If the input has 4 channels, the mode is assumed to be RGBA. - If the input has 3 channels, the mode is assumed to be RGB. - If the input has 2 channels, the mode is assumed to be LA. - If the input has 1 channel, the mode is determined by the data type (i.e int, float, short).

class flowvision.transforms.ToTensor[source]¶

Convert a PIL Image or numpy.ndarray to tensor.

Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a flow.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) or if the numpy.ndarray has dtype = np.uint8 In the other cases, tensors are returned without scaling.

Note

Because the input image is scaled to [0.0, 1.0], this transformation should not be used when transforming target image masks.