Model Benchmarking
You can use the script model_benchmark.py to collect some metrics about the models. Read Getting initial scripts to know how to download it.
Once you have the script, you can run a benchmark as follows:
python model_benchmark.py raft_small
This command will collect some metrics from the raft_small
model.
The results are printed in the terminal and also saved to a CSV file at the folder specified by the argument --output_path
.
When benchmarking a single model (as in the example above), it is possible to include model-specific arguments as well. For example:
python model_benchmark.py raft_small --iters 12
--iters
is an argument available inside the raft_small
model.
Benchmarking multiple models
You can also run the benchmark on several models at the same time, by providing select
as the first argument and then a list of model names for the --selection
argument.
For example, the command:
python model_benchmark.py select --selection raft_small pwcnet
would collect the benchmark results for raft_small
and pwcnet
models.
You can also benchmark all available models with:
python model_benchmark.py all
IMPORTANT: when benchmarking multiple models with select
or all
, it is not possible to provide model-specific argument directly from the command line!
Reported metrics
This script report the following metrics:
Number of model parameters
FLOPs
Running time
FLOPs and running time are relative to the input size and the chosen datatypes.
Useful arguments
You can find all the arguments accepted by this script by running:
python model_benchmark.py -h
Below we explain some of the most useful arguments you can control:
--num_trials
,--num_samples
,--sleep_interval
: use these to change the number of tests run to average the metrics. Each trial runs the model--num_samples
times.--sleep_interval
can be used to set a delay between each trial.--input_size
: the height and width, respectively, of the input to be used for benchmarking.--final_speed_mode
,--final_memory_mode
: how to average the speed and memory metrics.--datatypes
: a list of datatypes (fp16
and/orfp32
) to be tested.
The command below shows an example with all the above arguments:
python model_benchmark.py raft_small --num_trials 2 --num_samples 5 --sleep_interval 1.0 --input_size 384 1280 --final_speed_mode median --final_memory_mode first --datatypes fp16 fp32
Plotting results
You can create 2D scatter plots by choosing two of the available metrics.
You can check the names of valid metrics by checking the accepted values of --plot_axes
after running:
python model_benchmark.py -h
For example, the command below creates a scatter plot showing time and flops of three models:
python model_benchmark.py select --selection raft_small pwcnet flownets --plot_axes time flops
Known issues
Different GPU IDs
In machines with more than one GPU, sometimes the GPU ID from nvidia-smi
is different from the ID in PyTorch, causing wrong GPU memory usage reports.
If that happens, you will have to manually change the device_id
variable in model_benchmark.py
to synchronize the two IDs.
Variable running times
Calculating the running times of multiple models (using the arguments select
or all
) may cause later models to become slower.
If you want to get the lowest running times of each model, it is best to benchmark only one model at a time.