flow_transforms.py

Operations to perform image augmentations for optical flow.

Some operations are adapted from the following sources:

FlowNetPytorch: https://github.com/ClementPinard/FlowNetPytorch
RAFT: https://github.com/princeton-vl/RAFT/
flow-transforms-pytorch: https://github.com/hmorimitsu/flow-transforms-pytorch

class ptlflow.data.flow_transforms.ColorJitter(brightness: float | Tuple[float, float] = 0.0, contrast: float | Tuple[float, float] = 0.0, saturation: float | Tuple[float, float] = 0.0, hue: float | Tuple[float, float] = 0.0, asymmetric_prob: float = 0.0, use_keys: KeysView | Sequence[str] | None = ('images',), ignore_keys: KeysView | Sequence[str] | None = None)[source]

Randomly apply color transformations only to the images.

If asymmetric_prob == 0, then the same transform is applied on all the images, otherwise, the transform for each image is randomly sampled independently.

This is basically a wrapper for torchvision.transforms.ColorJitter.

Initialize ColorJitter.

Parameters:

brightnessUnion[float, Tuple[float, float]], default 0.0: The range to sample the random brightness value.
contrastUnion[float, Tuple[float, float]], default 0.0: The range to sample the random contrast value.
saturationUnion[float, Tuple[float, float]], default 0.0: The range to sample the random saturation value.
hueUnion[float, Tuple[float, float]], default 0.0: The range to sample the random hue value.
asymmetric_probfloat, default 0.0: Chance to apply an asymmetric transform, in which the parameters for transforming each image are sampled independently.
use_keysOptional[Union[KeysView, Sequence[str]]], default [‘images’]: If it is not None, then only elements with these keys will be transformed. Otherwise, all elements are transformed, except the keys that are listed in ignore_keys.
ignore_keysOptional[Union[KeysView, Sequence[str]]], optional: If use_keys is None, the these keys are NOT transformed by this operation.

class ptlflow.data.flow_transforms.Compose(transforms_list: Sequence[object])[source]

Similar to torchvision Compose. Applies a series of transforms from the input list in sequence.

__init__(transforms_list: Sequence[object]) → None[source]

Initialize Compose.

Parameters:

transforms_listSequence[object]: A sequence of transforms to be applied.

class ptlflow.data.flow_transforms.GaussianNoise(stdev: float = 0.0, use_keys: KeysView | Sequence[str] | None = ('images',), ignore_keys: KeysView | Sequence[str] | None = None)[source]

Applies random gaussian noise on the images.

__init__(stdev: float = 0.0, use_keys: KeysView | Sequence[str] | None = ('images',), ignore_keys: KeysView | Sequence[str] | None = None) → None[source]

Initialize GaussianNoise.

Parameters:

stdevfloat, default 0.0: The maximum standard deviation of the gaussian noise.
use_keysOptional[Union[KeysView, Sequence[str]]], optional: If it is not None, then only elements with these keys will be transformed. Otherwise, all elements are transformed, except the keys that are listed in ignore_keys.
ignore_keysOptional[Union[KeysView, Sequence[str]]], optional: If use_keys is None, the these keys are NOT transformed by this operation.

class ptlflow.data.flow_transforms.RandomFlip(hflip_prob: float = 0.0, vflip_prob: float = 0.0, asymmetric_prob: float = 0.0, use_keys: KeysView | Sequence[str] | None = None, ignore_keys: KeysView | Sequence[str] | None = None, image_keys: KeysView | Sequence[str] = ('images',), flow_keys: KeysView | Sequence[str] = ('flows', 'flows_b'))[source]

Randomly horizontally and vertically flips the inputs.

If asymmetric_prob > 0, then each input of the sequence may be flipped differently.

__init__(hflip_prob: float = 0.0, vflip_prob: float = 0.0, asymmetric_prob: float = 0.0, use_keys: KeysView | Sequence[str] | None = None, ignore_keys: KeysView | Sequence[str] | None = None, image_keys: KeysView | Sequence[str] = ('images',), flow_keys: KeysView | Sequence[str] = ('flows', 'flows_b')) → None[source]

Initialize RandomFlip.

Parameters:

hflip_probfloat, default 0.0: Probability of applying a horizontal flip.
vflip_probfloat, default 0.0: Probability of applying a vertical flip.
asymmetric_probfloat, default 0.0: Chance to apply an asymmetric transform, in which the parameters for transforming each image are sampled independently.
use_keysOptional[Union[KeysView, Sequence[str]]], optional: If it is not None, then only elements with these keys will be transformed. Otherwise, all elements are transformed, except the keys that are listed in ignore_keys.
ignore_keysOptional[Union[KeysView, Sequence[str]]], optional: If use_keys is None, the these keys are NOT transformed by this operation.
image_keysUnion[KeysView, Sequence[str]], [‘images’]: Indicate which of the input keys correspond to image tensors.
flow_keysUnion[KeysView, Sequence[str]], [‘flows’, ‘flows_b’]: Indicate which of the input keys correspond to optical flow tensors.

class ptlflow.data.flow_transforms.RandomPatchEraser(erase_prob: float = 0.0, num_patches: int | Tuple[int, int] = 1, patch_size: Tuple[int, int] | Tuple[int, int, int, int] = (0, 0), noise_type: str = 'mean', use_keys: KeysView | Sequence[str] | None = ('images',), ignore_keys: KeysView | Sequence[str] | None = None)[source]

Randomly covers a rectangular patch on the second image with noise, to simulate a pseudo-occlusion.

The noise_type may be the mean or random. This transform erases patches ONLY FROM THE SECOND IMAGE.

__init__(erase_prob: float = 0.0, num_patches: int | Tuple[int, int] = 1, patch_size: Tuple[int, int] | Tuple[int, int, int, int] = (0, 0), noise_type: str = 'mean', use_keys: KeysView | Sequence[str] | None = ('images',), ignore_keys: KeysView | Sequence[str] | None = None) → None[source]

Initialize RandomPatchEraser.

Parameters:

erase_probfloat, default 0.0: Probability of applying the transformation.
num_patchesUnion[int, Tuple[int, int]], default 1: Number of occlusion patches to generate. If it is a tuple, the number will be uniformly sampled from the interval.
patch_sizeUnion[Tuple[int, int], Tuple[int, int, int, int]], default (0, 0): Range of the size of the occlusion patches. If it is a tuple with 2 elements, then both sides are sampled from the same interval. Otherwise, different intervals can be specified for each side as (hmin, hmax, wmin, wmax).
noise_typestr, default ‘mean’: How to fill the occlusion patch. It can be either with the image ‘mean’ or with random ‘noise’.
use_keysOptional[Union[KeysView, Sequence[str]]], optional: If it is not None, then only elements with these keys will be transformed. Otherwise, all elements are transformed, except the keys that are listed in ignore_keys.
ignore_keysOptional[Union[KeysView, Sequence[str]]], optional: If use_keys is None, the these keys are NOT transformed by this operation.

class ptlflow.data.flow_transforms.RandomRotate(angle: float = 0.0, diff_angle: float = 0.0, flow_keys: KeysView | Sequence[str] = ('flows', 'flows_b'), occlusion_keys: KeysView | Sequence[str] = ('occs', 'occs_b'), valid_keys: KeysView | Sequence[str] = ('valids', 'valids_b'), binary_keys: KeysView | Sequence[str] = ('mbs', 'occs', 'valids', 'mbs_b', 'occs_b', 'valids_b'), sparse: bool = False)[source]

Applies random rotation to the inputs.

The inputs are rotated around the center of the image. First all inputs are rotated by the same random major angle. Then, another random angle a is sampled according to diff_angle. The first image will be rotated by a. The second image will be rotated by a reversed angle -a. The third will be rotated by a again, and so on…

__init__(angle: float = 0.0, diff_angle: float = 0.0, flow_keys: KeysView | Sequence[str] = ('flows', 'flows_b'), occlusion_keys: KeysView | Sequence[str] = ('occs', 'occs_b'), valid_keys: KeysView | Sequence[str] = ('valids', 'valids_b'), binary_keys: KeysView | Sequence[str] = ('mbs', 'occs', 'valids', 'mbs_b', 'occs_b', 'valids_b'), sparse: bool = False) → None[source]

Initialize RandomRotate.

Parameters:

anglefloat, default 0.0: The maximum absolute value to sample the major angle from.
diff_anglefloat, default 0.0: The maximum absolute value to sample the angle difference between consecutive images.
flow_keysUnion[KeysView, Sequence[str]], default [‘flows’, ‘flows_b’]: Indicate which of the input keys correspond to optical flow tensors.
occlusion_keysUnion[KeysView, Sequence[str]], default [‘occs’, ‘occs_b’]: Indicate which of the input keys correspond to occlusion mask tensors.
valid_keysUnion[KeysView, Sequence[str]], default [‘valids’, ‘valids_b’]: Indicate which of the input keys correspond to valid mask tensors.
binary_keysUnion[KeysView, Sequence[str]], default [‘mbs’, ‘occs’, ‘valids’, ‘mbs_b’, ‘occs_b’, ‘valids_b’]: Indicate which of the input keys correspond to binary tensors.
sparsebool, default False: If True, all binary inputs and flows are rotated with nearest grid_sample, instead of bilinear.

class ptlflow.data.flow_transforms.RandomScaleAndCrop(crop_size: Tuple[int, int] | None = None, major_scale: Tuple[float, float] = (0.0, 0.0), space_scale: Tuple[float, float] | Tuple[float, float, float, float] = (0.0, 0.0), time_scale: Tuple[float, float] | Tuple[float, float, float, float] = (0.0, 0.0), binary_keys: KeysView | Sequence[str] = ('mbs', 'occs', 'valids', 'mbs_b', 'occs_b', 'valids_b'), flow_keys: KeysView | Sequence[str] = ('flows', 'flows_b'), occlusion_keys: KeysView | Sequence[str] = ('occs', 'occs_b'), sparse: bool = False, valid_key: str = 'valids')[source]

Applies first random scale and then random crop to the inputs.

The scale is adjusted so that it is not smaller than the crop size.

The scale calculation is composed of 2 main stages:

A random major scale is sampled. The major scale defines the global scale applied to all images and dimensions. The major scale is calculated as:

ms = 2 ** random.uniform(major_scale[0], major_scale[1])
A random minor space scale is sampled. The space scale dictates the variation in scale applied to the width and height of each image. The space scale is calculated as:

ssh = 2 ** random.uniform(space_scale[0], space_scale[1])

ssw = 2 ** random.uniform(space_scale[2], space_scale[3]).

If len(space_scale) == 2, then ssw also uses space_scale[0] and space_scale[1].

The final scale applied to all inputs is:

scale_height = ms * ssh

scale_width = ms * ssw

If time_scale is provided, then a third scale is sampled before computing the final scale. The time_scale is sampled independently for each element of a sequence. This allows, for example, for the first image have a different scale then the second one. The time scales tsh and tsw are calculated as the space scales ssh and ssw. With time scales, the final scales are calculated as:

scale_height_time_t = ms * ssh * tsh_t

scale_width_time_t = ms * ssw * tsw_t

__init__(crop_size: Tuple[int, int] | None = None, major_scale: Tuple[float, float] = (0.0, 0.0), space_scale: Tuple[float, float] | Tuple[float, float, float, float] = (0.0, 0.0), time_scale: Tuple[float, float] | Tuple[float, float, float, float] = (0.0, 0.0), binary_keys: KeysView | Sequence[str] = ('mbs', 'occs', 'valids', 'mbs_b', 'occs_b', 'valids_b'), flow_keys: KeysView | Sequence[str] = ('flows', 'flows_b'), occlusion_keys: KeysView | Sequence[str] = ('occs', 'occs_b'), sparse: bool = False, valid_key: str = 'valids') → None[source]

Initialize RandomScaleAndCrop.

Parameters:

crop_sizeOptional[Tuple[int, int]], optional: If provided, crop the inputs to this size (h, w).
major_scaleTuple[float, float], default (0.0, 0.0): The range of the major scale. See the class description for more details.
space_scaleUnion[Tuple[float, float], Tuple[float, float, float, float]], default (0.0, 0.0): The range of the minor scale. See the class description for more details.
time_scaleUnion[Tuple[float, float], Tuple[float, float, float, float]], default (0.0, 0.0): NOTE: Currently not implemented. The range of the time scale. See the class description for more details.
binary_keysUnion[KeysView, Sequence[str]], default [‘mbs’, ‘occs’, ‘valids’, ‘mbs_b’, ‘occs_b’, ‘valids_b’]: Indicate which of the input keys correspond to binary tensors.
flow_keysUnion[KeysView, Sequence[str]], default [‘flows’, ‘flows_b’]: Indicate which of the input keys correspond to optical flow tensors.
occlusion_keysUnion[KeysView, Sequence[str]], default [‘occs’, ‘occs_b’]: Indicate which of the input keys correspond to occlusion mask tensors.
sparsebool, default False: If True, only values at valid positions (indicated by the mask in inputs[valid_key]) will be kept when resizing binary and flow inputs. Requires valid_key to exist as a key in inputs.
valid_keysstr, default ‘valids’: The name of the key in inputs that contains the binary mask indicating which pixels are valid. Only used when sparse=True.

class ptlflow.data.flow_transforms.RandomTranslate(translation: int | Tuple[int, int] = 0, flow_keys: KeysView | Sequence[str] = ('flows', 'flows_b'), occlusion_keys: KeysView | Sequence[str] = ('occs', 'occs_b'))[source]

Creates a translation between images by applying a random alternated crop on the sequence of inputs.

A translation value t is randomly selected first. Then, the first image is cropped by a box translated by t. The second image will be cropped by a reversed translation -t. The third will be cropped by t again, and so on…

__init__(translation: int | Tuple[int, int] = 0, flow_keys: KeysView | Sequence[str] = ('flows', 'flows_b'), occlusion_keys: KeysView | Sequence[str] = ('occs', 'occs_b')) → None[source]

Initialize RandomTranslate.

Parameters:

translationUnion[int, Tuple[int, int]], default 0: Maximum translation (in pixels) to be applied to the inputs. If a tuple, it corresponds to the maximum in the (y, x) axes.
flow_keysUnion[KeysView, Sequence[str]], default [‘flows’, ‘flows_b’]: Indicate which of the input keys correspond to optical flow tensors.
occlusion_keysUnion[KeysView, Sequence[str]], default [‘occs’, ‘occs_b’]: Indicate which of the input keys correspond to occlusion mask tensors.

class ptlflow.data.flow_transforms.Resize(size: Tuple[int, int] = (0, 0), scale: float = 1.0, binary_keys: KeysView | Sequence[str] = ('mbs', 'occs', 'valids', 'mbs_b', 'occs_b', 'valids_b'), flow_keys: KeysView | Sequence[str] = ('flows', 'flows_b'), sparse: bool = False, valid_key: str = 'valids')[source]

Resize the image to a given size or scale.

Size is checked first, if any of its values is zero, then scale is used.

__init__(size: Tuple[int, int] = (0, 0), scale: float = 1.0, binary_keys: KeysView | Sequence[str] = ('mbs', 'occs', 'valids', 'mbs_b', 'occs_b', 'valids_b'), flow_keys: KeysView | Sequence[str] = ('flows', 'flows_b'), sparse: bool = False, valid_key: str = 'valids') → None[source]

Initialize Resize.

Parameters:

sizeTuple[int, int], default (0, 0): The target size to resize the inputs. If it is zeros, then the scale will be used instead.
scalefloat, default 1.0: The scale factor to resize the images. Only used if size is zeros.
binary_keysUnion[KeysView, Sequence[str]], default [‘mbs’, ‘occs’, ‘valids’, ‘mbs_b’, ‘occs_b’, ‘valids_b’]: Indicate which of the input keys correspond to binary tensors. [description], by default [‘mbs’, ‘occs’, ‘valids’, ‘mbs_b’, ‘occs_b’, ‘valids_b’]
flow_keysUnion[KeysView, Sequence[str]], default [‘flows’, ‘flows_b’]: Indicate which of the input keys correspond to optical flow tensors.
sparsebool, default False: If True, only values at valid positions (indicated by the mask in inputs[valid_key]) will be kept when resizing binary and flow inputs. Requires valid_key to exist as a key in inputs.
valid_keysstr, default ‘valids’: The name of the key in inputs that contains the binary mask indicating which pixels are valid. Only used when sparse=True.

Converts a 4D numpy.ndarray or a list of 3D numpy.ndarrays into a 4D torch.Tensor.

If an input is of type uint8, then it is converted to float and its values are divided by 255.

Initialize ToTensor.

Parameters:

fp16bool, default False: If True, the tensors use have-precision floating point.
deviceUnion[str, torch.device], default ‘cpu’: Name of the torch device where the tensors will be put in.
use_keysOptional[Union[KeysView, Sequence[str]]], optional: If it is not None, then only elements with these keys will be transformed. Otherwise, all elements are transformed, except the keys that are listed in ignore_keys.
ignore_keysOptional[Union[KeysView, Sequence[str]]], optional: If use_keys is None, the these keys are NOT transformed by this operation.