datasets.py

Handle common datasets used in optical flow estimation.

class ptlflow.data.datasets.AutoFlowDataset(root_dir: str, split: str = 'train', transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 10000.0, get_valid_mask: bool = True, get_meta: bool = True)[source]

Handle the AutoFlow dataset.

__init__(root_dir: str, split: str = 'train', transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 10000.0, get_valid_mask: bool = True, get_meta: bool = True) None[source]

Initialize AutoFlowDataset.

Parameters:
root_dirstr

path to the root directory of the AutoFlow dataset.

splitstr, default ‘train’

Which split of the dataset should be loaded. It can be one of {‘train’, ‘val’, ‘trainval’}.

transformCallable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]], optional

Transform to be applied on the inputs.

max_flowfloat, default 10000.0

Maximum optical flow absolute value. Flow absolute values that go over this limit are clipped, and also marked as zero in the valid mask.

get_valid_maskbool, default True

Whether to get or generate valid masks.

get_metabool, default True

Whether to get metadata.

class ptlflow.data.datasets.BaseFlowDataset(dataset_name: str, split_name: str = '', transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 10000.0, get_valid_mask: bool = True, get_occlusion_mask: bool = True, get_motion_boundary_mask: bool = True, get_backward: bool = True, get_meta: bool = True)[source]

Manage optical flow dataset loading.

This class can be used as the parent for any concrete dataset. It is structured to be able to read most types of inputs used in optical flow estimation.

Classes inheriting from this one should implement the __init__() method and properly load the input paths from the chosen dataset. This should be done by populating the lists defined in the attributes below.

Attributes:
img_pathslist[list[str]]

Paths of the images. Each element of the main list is a list of paths. Typically, the inner list will have two elements, corresponding to the paths of two consecutive images, which will be used to estimate the optical flow. More than two paths can also be added in case the model is able to use more images for estimating the flow.

flow_pathslist[list[str]]

Similar structure to img_paths. However, the inner list must have exactly one element less than img_paths. For example, if an entry of img_paths is composed of two paths, then an entry of flow_list should be a list with a single path, corresponding to the optical flow from the first image to the second.

occ_pathslist[list[str]]

Paths to the occlusion masks, follows the same structure as flow_paths. It can be left empty if not available.

mb_pathslist[list[str]]

Paths to the motion boundary masks, follows the same structure as flow_paths. It can be left empty if not available.

flow_b_pathslist[list[str]]

The same as flow_paths, but it corresponds to the backward flow. This list must be in the same order as flow_paths. For example, flow_b_paths[i] must be backward flow of flow_paths[i]. It can be left empty if backard flows are not available.

occ_b_pathslist[list[str]]

Backward occlusion mask paths, read occ_paths and flow_b_paths above.

mb_b_pathslist[list[str]]

Backward motion boundary mask paths, read mb_paths and flow_b_paths above.

metadatalist[Any]

Some metadata for each input. It can include anything. A good recommendation would be to put a dict with the metadata.

__init__(dataset_name: str, split_name: str = '', transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 10000.0, get_valid_mask: bool = True, get_occlusion_mask: bool = True, get_motion_boundary_mask: bool = True, get_backward: bool = True, get_meta: bool = True) None[source]

Initialize BaseFlowDataset.

Parameters:
dataset_namestr

A string representing the dataset name. It is just used to be stored as metadata, so it can have any value.

split_namestr, optional

A string representing the split of the data. It is just used to be stored as metadata, so it can have any value.

transformCallable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]], optional

Transform to be applied on the inputs.

max_flowfloat, default 10000.0

Maximum optical flow absolute value. Flow absolute values that go over this limit are clipped, and also marked as zero in the valid mask.

get_valid_maskbool, default True

Whether to get or generate valid masks.

get_occlusion_maskbool, default True

Whether to get occlusion masks.

get_motion_boundary_maskbool, default True

Whether to get motion boundary masks.

get_backwardbool, default True

Whether to get the occluded version of the inputs.

get_metabool, default True

Whether to get metadata.

class ptlflow.data.datasets.FlyingChairs2Dataset(root_dir: str, split: str = 'train', add_reverse: bool = True, transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 1000.0, get_valid_mask: bool = True, get_occlusion_mask: bool = True, get_motion_boundary_mask: bool = True, get_backward: bool = True, get_meta: bool = True)[source]

Handle the FlyingChairs 2 dataset.

__init__(root_dir: str, split: str = 'train', add_reverse: bool = True, transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 1000.0, get_valid_mask: bool = True, get_occlusion_mask: bool = True, get_motion_boundary_mask: bool = True, get_backward: bool = True, get_meta: bool = True) None[source]

Initialize FlyingChairs2Dataset.

Parameters:
root_dirstr

path to the root directory of the FlyingChairs2 dataset.

splitstr, default ‘train’

Which split of the dataset should be loaded. It can be one of {‘train’, ‘val’, ‘trainval’}.

add_reversebool, default True

If True, double the number of samples by appending the backward samples as additional samples.

transformCallable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]], optional

Transform to be applied on the inputs.

max_flowfloat, default 10000.0

Maximum optical flow absolute value. Flow absolute values that go over this limit are clipped, and also marked as zero in the valid mask.

get_valid_maskbool, default True

Whether to get or generate valid masks.

get_occlusion_maskbool, default True

Whether to get occlusion masks.

get_motion_boundary_maskbool, default True

Whether to get motion boundary masks.

get_backwardbool, default True

Whether to get the occluded version of the inputs.

get_metabool, default True

Whether to get metadata.

class ptlflow.data.datasets.FlyingChairsDataset(root_dir: str, split: str = 'train', transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 10000.0, get_valid_mask: bool = True, get_meta: bool = True)[source]

Handle the FlyingChairs dataset.

__init__(root_dir: str, split: str = 'train', transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 10000.0, get_valid_mask: bool = True, get_meta: bool = True) None[source]

Initialize FlyingChairsDataset.

Parameters:
root_dirstr

path to the root directory of the FlyingChairs dataset.

splitstr, default ‘train’

Which split of the dataset should be loaded. It can be one of {‘train’, ‘val’, ‘trainval’}.

transformCallable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]], optional

Transform to be applied on the inputs.

max_flowfloat, default 10000.0

Maximum optical flow absolute value. Flow absolute values that go over this limit are clipped, and also marked as zero in the valid mask.

get_valid_maskbool, default True

Whether to get or generate valid masks.

get_metabool, default True

Whether to get metadata.

class ptlflow.data.datasets.FlyingThings3DDataset(root_dir: str, split: str = 'train', pass_names: str | List[str] = 'clean', side_names: str | List[str] = 'left', add_reverse: bool = True, transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 1000.0, get_valid_mask: bool = True, get_occlusion_mask: bool = True, get_motion_boundary_mask: bool = True, get_backward: bool = True, get_meta: bool = True, sequence_length: int = 2, sequence_position: str = 'first')[source]

Handle the FlyingThings3D dataset.

Note that this only works for the complete FlyingThings3D dataset. For the subset version, use FlyingThings3DSubsetDataset.

__init__(root_dir: str, split: str = 'train', pass_names: str | List[str] = 'clean', side_names: str | List[str] = 'left', add_reverse: bool = True, transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 1000.0, get_valid_mask: bool = True, get_occlusion_mask: bool = True, get_motion_boundary_mask: bool = True, get_backward: bool = True, get_meta: bool = True, sequence_length: int = 2, sequence_position: str = 'first') None[source]

Initialize FlyingThings3DDataset.

Parameters:
root_dirstr

path to the root directory of the FlyingThings3D dataset.

splitstr, default ‘train’

Which split of the dataset should be loaded. It can be one of {‘train’, ‘val’, ‘trainval’}.

pass_namesUnion[str, List[str]], default ‘clean’

Which passes should be loaded. It can be one of {‘clean’, ‘final’, [‘clean’, ‘final’]}.

side_namesUnion[str, List[str]], default ‘left’

Samples from which side view should be loaded. It can be one of {‘left’, ‘right’, [‘left’, ‘right’]}.

add_reversebool, default True

If True, double the number of samples by appending the backward samples as additional samples.

transformCallable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]], optional

Transform to be applied on the inputs.

max_flowfloat, default 10000.0

Maximum optical flow absolute value. Flow absolute values that go over this limit are clipped, and also marked as zero in the valid mask.

get_valid_maskbool, default True

Whether to get or generate valid masks.

get_occlusion_maskbool, default True

Whether to get occlusion masks.

get_motion_boundary_maskbool, default True

Whether to get motion boundary masks.

get_backwardbool, default True

Whether to get the backward version of the inputs.

get_metabool, default True

Whether to get metadata.

sequence_lengthint, default 2

How many consecutive images are loaded per sample. More than two images can be used for model which exploit more temporal information.

sequence_positionstr, default “first”

Only used when sequence_length > 2. Determines the position where the main image frame will be in the sequence. It can one of three values: - “first”: the main frame will be the first one of the sequence, - “middle”: the main frame will be in the middle of the sequence (at position sequence_length // 2), - “last”: the main frame will be the penultimate in the sequence.

class ptlflow.data.datasets.FlyingThings3DSubsetDataset(root_dir: str, split: str = 'train', pass_names: str | List[str] = 'clean', side_names: str | List[str] = 'left', add_reverse: bool = True, transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 1000.0, get_valid_mask: bool = True, get_occlusion_mask: bool = True, get_motion_boundary_mask: bool = True, get_backward: bool = True, get_meta: bool = True, sequence_length: int = 2, sequence_position: str = 'first')[source]

Handle the FlyingThings3D subset dataset.

Note that this only works for the FlyingThings3D subset dataset. For the complete version, use FlyingThings3DDataset.

__init__(root_dir: str, split: str = 'train', pass_names: str | List[str] = 'clean', side_names: str | List[str] = 'left', add_reverse: bool = True, transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 1000.0, get_valid_mask: bool = True, get_occlusion_mask: bool = True, get_motion_boundary_mask: bool = True, get_backward: bool = True, get_meta: bool = True, sequence_length: int = 2, sequence_position: str = 'first') None[source]

Initialize FlyingThings3DSubsetDataset.

Parameters:
root_dirstr

path to the root directory of the FlyingThings3D dataset.

splitstr, default ‘train’

Which split of the dataset should be loaded. It can be one of {‘train’, ‘val’, ‘trainval’}.

pass_namesUnion[str, List[str]], default ‘clean’

Which passes should be loaded. It can be one of {‘clean’, ‘final’, [‘clean’, ‘final’]}.

side_namesUnion[str, List[str]], default ‘left’

Samples from which side view should be loaded. It can be one of {‘left’, ‘right’, [‘left’, ‘right’]}.

add_reversebool, default True

If True, double the number of samples by appending the backward samples as additional samples.

transformCallable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]], optional

Transform to be applied on the inputs.

max_flowfloat, default 10000.0

Maximum optical flow absolute value. Flow absolute values that go over this limit are clipped, and also marked as zero in the valid mask.

get_valid_maskbool, default True

Whether to get or generate valid masks.

get_occlusion_maskbool, default True

Whether to get occlusion masks.

get_motion_boundary_maskbool, default True

Whether to get motion boundary masks.

get_backwardbool, default True

Whether to get the occluded version of the inputs.

get_metabool, default True

Whether to get metadata.

sequence_lengthint, default 2

How many consecutive images are loaded per sample. More than two images can be used for model which exploit more temporal information.

sequence_positionstr, default “first”

Only used when sequence_length > 2. Determines the position where the main image frame will be in the sequence. It can one of three values: - “first”: the main frame will be the first one of the sequence, - “middle”: the main frame will be in the middle of the sequence (at position sequence_length // 2), - “last”: the main frame will be the penultimate in the sequence.

class ptlflow.data.datasets.Hd1kDataset(root_dir: str, split: str = 'train', transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 512.0, get_valid_mask: bool = True, get_meta: bool = True, sequence_length: int = 2, sequence_position: str = 'first')[source]

Handle the HD1K dataset.

__init__(root_dir: str, split: str = 'train', transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 512.0, get_valid_mask: bool = True, get_meta: bool = True, sequence_length: int = 2, sequence_position: str = 'first') None[source]

Initialize Hd1kDataset.

Parameters:
root_dirstr

path to the root directory of the HD1K dataset.

splitstr, default ‘train’

Which split of the dataset should be loaded. It can be one of {‘train’, ‘val’, ‘trainval’, ‘test’}.

transformCallable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]], optional

Transform to be applied on the inputs.

max_flowfloat, default 512.0

Maximum optical flow absolute value. Flow absolute values that go over this limit are clipped, and also marked as zero in the valid mask.

get_valid_maskbool, default True

Whether to get or generate valid masks.

get_metabool, default True

Whether to get metadata.

sequence_lengthint, default 2

How many consecutive images are loaded per sample. More than two images can be used for model which exploit more temporal information.

sequence_positionstr, default “first”

Only used when sequence_length > 2. Determines the position where the main image frame will be in the sequence. It can one of three values: - “first”: the main frame will be the first one of the sequence, - “middle”: the main frame will be in the middle of the sequence (at position sequence_length // 2), - “last”: the main frame will be the penultimate in the sequence.

class ptlflow.data.datasets.KittiDataset(root_dir_2012: str | None = None, root_dir_2015: str | None = None, split: str = 'train', versions: str | List[str] = '2015', transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 512.0, get_valid_mask: bool = True, get_occlusion_mask: bool = False, get_meta: bool = True)[source]

Handle the KITTI dataset.

__init__(root_dir_2012: str | None = None, root_dir_2015: str | None = None, split: str = 'train', versions: str | List[str] = '2015', transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 512.0, get_valid_mask: bool = True, get_occlusion_mask: bool = False, get_meta: bool = True) None[source]

Initialize KittiDataset.

Parameters:
root_dir_2012str, optional.

Path to the root directory of the KITTI 2012 dataset, if available.

root_dir_2015str, optional.

Path to the root directory of the KITTI 2015 dataset, if available.

splitstr, default ‘train’

Which split of the dataset should be loaded. It can be one of {‘train’, ‘val’, ‘trainval’, ‘test’}.

versionsUnion[str, List[str]], default ‘2015’

Which version should be loaded. It can be one of {‘2012’, ‘2015’, [‘2012’, ‘2015’]}.

transformCallable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]], optional

Transform to be applied on the inputs.

max_flowfloat, default 512.0

Maximum optical flow absolute value. Flow absolute values that go over this limit are clipped, and also marked as zero in the valid mask.

get_valid_maskbool, default True

Whether to get or generate valid masks.

get_occlusion_maskbool, default True

Whether to get occlusion masks.

get_metabool, default True

Whether to get metadata.

class ptlflow.data.datasets.MiddleburyDataset(root_dir: str, split: str = 'train', transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 10000.0, get_valid_mask: bool = True, get_meta: bool = True)[source]

Handle the Middlebury dataset.

__init__(root_dir: str, split: str = 'train', transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 10000.0, get_valid_mask: bool = True, get_meta: bool = True) None[source]

Initialize MiddleburyDataset.

Parameters:
root_dirstr

path to the root directory of the Middlebury dataset.

splitstr, default ‘train’

Which split of the dataset should be loaded. It can be one of {‘train’, ‘val’, ‘trainval’, ‘test’}.

pass_namesUnion[str, List[str]], default ‘clean’

Which passes should be loaded. It can be one of {‘clean’, ‘final’, [‘clean’, ‘final’]}.

transformCallable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]], optional

Transform to be applied on the inputs.

max_flowfloat, default 10000.0

Maximum optical flow absolute value. Flow absolute values that go over this limit are clipped, and also marked as zero in the valid mask.

get_valid_maskbool, default True

Whether to get or generate valid masks.

get_occlusion_maskbool, default True

Whether to get occlusion masks.

get_metabool, default True

Whether to get metadata.

class ptlflow.data.datasets.MonkaaDataset(root_dir: str, pass_names: str | List[str] = 'clean', side_names: str | List[str] = 'left', add_reverse: bool = True, transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 1000.0, get_valid_mask: bool = True, get_backward: bool = True, get_meta: bool = True, sequence_length: int = 2, sequence_position: str = 'first')[source]

Handle the Monkaa dataset.

__init__(root_dir: str, pass_names: str | List[str] = 'clean', side_names: str | List[str] = 'left', add_reverse: bool = True, transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 1000.0, get_valid_mask: bool = True, get_backward: bool = True, get_meta: bool = True, sequence_length: int = 2, sequence_position: str = 'first') None[source]

Initialize MonkaaDataset.

Parameters:
root_dirstr

path to the root directory of the Monkaa dataset.

pass_namesUnion[str, List[str]], default ‘clean’

Which passes should be loaded. It can be one of {‘clean’, ‘final’, [‘clean’, ‘final’]}.

side_namesUnion[str, List[str]], default ‘left’

Samples from which side view should be loaded. It can be one of {‘left’, ‘right’, [‘left’, ‘right’]}.

add_reversebool, default True

If True, double the number of samples by appending the backward samples as additional samples.

transformCallable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]], optional

Transform to be applied on the inputs.

max_flowfloat, default 10000.0

Maximum optical flow absolute value. Flow absolute values that go over this limit are clipped, and also marked as zero in the valid mask.

get_valid_maskbool, default True

Whether to get or generate valid masks.

get_backwardbool, default True

Whether to get the occluded version of the inputs.

get_metabool, default True

Whether to get metadata.

sequence_lengthint, default 2

How many consecutive images are loaded per sample. More than two images can be used for model which exploit more temporal information.

sequence_positionstr, default “first”

Only used when sequence_length > 2. Determines the position where the main image frame will be in the sequence. It can one of three values: - “first”: the main frame will be the first one of the sequence, - “middle”: the main frame will be in the middle of the sequence (at position sequence_length // 2), - “last”: the main frame will be the penultimate in the sequence.

class ptlflow.data.datasets.SintelDataset(root_dir: str, split: str = 'train', pass_names: str | List[str] = 'clean', transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 10000.0, get_valid_mask: bool = True, get_occlusion_mask: bool = True, get_meta: bool = True, sequence_length: int = 2, sequence_position: str = 'first')[source]

Handle the MPI Sintel dataset.

__init__(root_dir: str, split: str = 'train', pass_names: str | List[str] = 'clean', transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 10000.0, get_valid_mask: bool = True, get_occlusion_mask: bool = True, get_meta: bool = True, sequence_length: int = 2, sequence_position: str = 'first') None[source]

Initialize SintelDataset.

Parameters:
root_dirstr

path to the root directory of the MPI Sintel dataset.

splitstr, default ‘train’

Which split of the dataset should be loaded. It can be one of {‘train’, ‘val’, ‘trainval’, ‘test’}.

pass_namesUnion[str, List[str]], default ‘clean’

Which passes should be loaded. It can be one of {‘clean’, ‘final’, [‘clean’, ‘final’]}.

transformCallable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]], optional

Transform to be applied on the inputs.

max_flowfloat, default 10000.0

Maximum optical flow absolute value. Flow absolute values that go over this limit are clipped, and also marked as zero in the valid mask.

get_valid_maskbool, default True

Whether to get or generate valid masks.

get_occlusion_maskbool, default True

Whether to get occlusion masks.

get_metabool, default True

Whether to get metadata.

sequence_lengthint, default 2

How many consecutive images are loaded per sample. More than two images can be used for model which exploit more temporal information.

sequence_positionstr, default “first”

Only used when sequence_length > 2. Determines the position where the main image frame will be in the sequence. It can one of three values: - “first”: the main frame will be the first one of the sequence, - “middle”: the main frame will be in the middle of the sequence (at position sequence_length // 2), - “last”: the main frame will be the penultimate in the sequence.

class ptlflow.data.datasets.SpringDataset(root_dir: str, split: str = 'train', side_names: str | List[str] = 'left', add_reverse: bool = True, transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 10000.0, get_valid_mask: bool = True, get_backward: bool = False, get_meta: bool = True, sequence_length: int = 2, sequence_position: str = 'first', reverse_only: bool = False)[source]

Handle the Spring dataset.

__init__(root_dir: str, split: str = 'train', side_names: str | List[str] = 'left', add_reverse: bool = True, transform: Callable[[Dict[str, Tensor]], Dict[str, Tensor]] | None = None, max_flow: float = 10000.0, get_valid_mask: bool = True, get_backward: bool = False, get_meta: bool = True, sequence_length: int = 2, sequence_position: str = 'first', reverse_only: bool = False) None[source]

Initialize SintelDataset.

Parameters:
root_dirstr

path to the root directory of the MPI Sintel dataset.

splitstr, default ‘train’

Which split of the dataset should be loaded. It can be one of {‘train’, ‘val’, ‘trainval’, ‘test’}.

side_namesUnion[str, List[str]], default ‘left’

Samples from which side view should be loaded. It can be one of {‘left’, ‘right’, [‘left’, ‘right’]}.

add_reversebool, default True

If True, double the number of samples by appending the backward samples as additional samples.

transformCallable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]], optional

Transform to be applied on the inputs.

max_flowfloat, default 10000.0

Maximum optical flow absolute value. Flow absolute values that go over this limit are clipped, and also marked as zero in the valid mask.

get_valid_maskbool, default True

Whether to get or generate valid masks.

get_backwardbool, default True

Whether to get the backward version of the inputs.

get_metabool, default True

Whether to get metadata.

sequence_lengthint, default 2

How many consecutive images are loaded per sample. More than two images can be used for model which exploit more temporal information.

sequence_positionstr, default “first”

Only used when sequence_length > 2. Determines the position where the main image frame will be in the sequence. It can one of three values: - “first”: the main frame will be the first one of the sequence, - “middle”: the main frame will be in the middle of the sequence (at position sequence_length // 2), - “last”: the main frame will be the penultimate in the sequence.

reverse_onlybool, default False

If True, only uses the backward samples, discarding the forward ones.