0%

Metrics in MOT

Metrics in MOT

定义来自:

https://arxiv.org/abs/1907.12740

1
2
3
4
5
6
7
8
9
@article{ciaparrone2020deep,
title={Deep learning in video multi-object tracking: A survey},
author={Ciaparrone, Gioele and S{\'a}nchez, Francisco Luque and Tabik, Siham and Troiano, Luigi and Tagliaferri, Roberto and Herrera, Francisco},
journal={Neurocomputing},
volume={381},
pages={61--88},
year={2020},
publisher={Elsevier}
}

Trajectory v.s. Tracklet

Trajectory (轨迹):一条轨迹对应这一个目标在一个时间段内的位置序列
Tracklet (轨迹段):形成Trajectory过程中的轨迹片段。完整的Trajectory是由属于同一object的Tracklets构成的。

Metrics

Classical metrics

Mostly Tracked (MT) trajectories : number of ground-truth trajectories that are correctly tracked in at least 80% of the frames.

Note that it is irrelevant for this measure whether the ID remains the same throughout the track.

至少80%的帧中被正确跟踪的ground-truth中的轨迹数。这里需要注意的一点是:不管这条轨迹上 ID 如何的变化(比如预测的时候发生了变化),但只要还是这条轨迹占到ground-truth轨迹的 80% 以上就可以认为是 MT,即得到匹配就视为正确跟踪。

Mostly Lost (ML) trajectories : number of ground-truth trajectories that are correctly tracked in less than 20% of the frames. 少于20%的帧中被正确跟踪的ground-truth轨迹数。

Partially Tracked (PT) : PT 部分跟踪

除了 MT、ML ,其他的都认为是 PT

False trajectories : predicted trajectories which do not correspond to a real object (i.e. to a ground truth trajectory). 不能对应到真实目标的预测轨迹的数量

ID switches : number of times when the object is correctly tracked, but the associated ID for the object is mistakenly changed. 正确跟踪对象的次数,但错误关联了ID(ID被改变)。

Fragmentation (FM ):To that end, the number of track fragmentations(FM) counts how many times a ground truth trajectory is interrupted (untracked). In other words, a fragmentation is counted each time a trajectory changes its status from tracked to untracked and tracking of that same trajectory is resumed at a later point. FM计算的是跟踪有多少次被打断,这个与ID变换无关

CLEAR MOT metrics

  • FP : the number of false positives in the whole video; 假阳性:整个视频不能与真实边界框关联的假设数量。误报数量。

关联成功的认定:IoU > $\alpha$(0.5),交并比,如果在$t-1$帧,ground truth$o_i$和hypothesis$h_j$匹配,即$IoU(o_i,h_j)\geqslant 0.5$,在$t$帧,仍然有$IoU(o_i,h_j)\geqslant 0.5$,那么,即使有$IoU(o_i,h_k)>IoU(o_i,h_j)$也认为$o_i$和$h_j$匹配

  • FN : the number of false negatives in the whole video; 假阴性:整个视频中不能与假设关联的真实边界框的数量。漏报数量,未命中。

  • Fragm : the total number of fragmentations; 每次ground truth对象跟踪被中断并随后恢复时,都被视为碎片。碎片的总数。同上,FM

  • IDSW : the total number of ID switches. 每次被跟踪的真实对象ID在跟踪持续时间内被错误地更改时,将被视为一个ID switch。同上,ID switch

where $GT $ is the number of ground truth boxes.

缺失率$\frac{FN}{GT}$,误判率$\frac{FP}{GT}$,误匹配率$\frac{IDSW}{GT}$

MOTA越接近1越好,MOTA 主要考虑的是 tracking 中所有对象匹配错误,主要是 FP、FN、IDs、MOTA 给出的是非常直观的衡量跟踪其在检测物体和保持轨迹时的性能,与目标检测精度无关。

where $c_t
$ denotes the number of matches in frame $t $ and $d_{t,i}
$ is the bounding box overlap between the hypothesis $i
$ with its assigned ground truth object.

$c_t
$ 为第t帧匹配的次数,$d_{t,i}$为假设$i $与其指定的ground truth对象之间的匹配误差。值得注意的是,这个指标只考虑了很少的跟踪信息,而更关注于检测的质量。

ID scores

ID相关的指标,有时候比较关注长时间的跟踪错误(如航空场景),这时比较关注ID问题。

具体计算的时候是构建了一个二分图,一边的点是$V_{T}$,是由所有存在的gt轨迹和对每个计算得到的点构建一个FN点构成,另一边的点$V_{C}$是由计算得到的点和对每个gt点建一个对应的FP点构成,最后做最小费用的匹配,边的费用在\cite{ristani2016performance}有更详细的解释。如果$V_{T}$和计算得到的点匹配,那么就是$IDTP$,如果计算得到的点与FN点匹配计入$IDFN$,如果gt的点与FP点匹配则计入$IDFP$

Identification precision (IDP)

Identification recall (IDR)

Identification F1 (IDF1)

Reference

https://www.yuque.com/aicv/lab/tc9yqd

计算:

https://github.com/cheind/py-motmetrics

1
2
3
4
5
6
7
8
@inproceedings{ristani2016performance,
title={Performance measures and a data set for multi-target, multi-camera tracking},
author={Ristani, Ergys and Solera, Francesco and Zou, Roger and Cucchiara, Rita and Tomasi, Carlo},
booktitle={European conference on computer vision},
pages={17--35},
year={2016},
organization={Springer}
}
Have fun.