Dataset | Best Model | Paper | Code | Compare |
---|---|---|---|---|
TV show Retrieval | DL-DKD | TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval | ||
Activitynet Captions | DL-DKD | Dynamic Temporal Pyramid Network: A Closer Look at Multi-Scale Modeling for Activity Detection | ||
Charades-STA | MS-SL | TALL: Temporal Activity Localization via Language Query |
Model | R@1 | R@5 | R@10 | R@100 | SumR | ||||
---|---|---|---|---|---|---|---|---|---|
T2VR models: | |||||||||
W2VV, TMM18 [1] | 2.6 | 5.6 | 7.5 | 20.6 | 36.3 | ||||
HGR, CVPR20 [2] | 1.7 | 4.9 | 8.3 | 35.2 | 50.1 | ||||
HTM, ICCV19 [3] | 3.8 | 12.0 | 19.1 | 63.2 | 98.2 | ||||
CE, BMVC19 [4] | 3.7 | 12.8 | 20.1 | 64.5 | 101.1 | ||||
W2VV++, MM19 [5] | 5.0 | 14.7 | 21.7 | 61.8 | 103.2 | ||||
VSE++, BMVC19 [6] | 7.5 | 19.9 | 27.7 | 66.0 | 121.1 | ||||
DE, CVPR19 [7] | 7.6 | 20.1 | 28.1 | 67.6 | 123.4 | ||||
DE++, TPAMI21 [8] | 8.8 | 21.9 | 30.2 | 67.4 | 128.3 | ||||
RIVRL, TCSVT22 [9] | 9.4 | 23.4 | 32.2 | 70.6 | 135.6 | ||||
VCMR models w/o moment localization: | |||||||||
XML, ECCV20 [10] | 10.0 | 26.5 | 37.3 | 81.3 | 155.1 | ||||
ReLoCLNet, SIGIR21 [11] | 10.7 | 28.1 | 38.1 | 80.3 | 157.1 | ||||
MS-SL | 13.5 | 32.1 | 43.4 | 83.4 | 172.3 | ||||
DL-DKD | 14.4 | 34.9 | 45.8 | 84.9 | 179.9 |
Model | R@1 | R@5 | R@10 | R@100 | SumR | ||||
---|---|---|---|---|---|---|---|---|---|
T2VR models: | |||||||||
W2VV [1] | 2.2 | 9.5 | 16.6 | 45.5 | 73.8 | ||||
HTM [3] | 3.7 | 13.7 | 22.3 | 66.2 | 105.9 | ||||
HGR [2] | 4.0 | 15.0 | 24.8 | 63.2 | 107.0 | ||||
RIVRL [9] | 5.2 | 18.0 | 28.2 | 66.4 | 117.8 | ||||
VSE++ [6] | 4.9 | 17.7 | 28.2 | 67.1 | 117.9 | ||||
DE++ [8] | 5.3 | 18.4 | 29.2 | 68.0 | 121.0 | ||||
DE [7] | 5.6 | 18.8 | 29.4 | 67.8 | 121.7 | ||||
W2VV++ [5] | 5.4 | 18.7 | 29.7 | 68.8 | 122.6 | ||||
CE [4] | 5.5 | 19.1 | 29.9 | 71.1 | 125.6 | ||||
VCMR models w/o moment localization: | |||||||||
XML [10] | 5.7 | 18.9 | 30.0 | 72.0 | 126.6 | ||||
ReLoCLNet [11] | 5.3 | 19.4 | 30.6 | 73.1 | 128.4 | ||||
MS-SL | 7.1 | 22.5 | 34.7 | 75.8 | 140.1 | ||||
DL-DKD | 8.0 | 25.0 | 37.5 | 77.1 | 147.6 |
Model | R@1 | R@5 | R@10 | R@100 | SumR | ||||
---|---|---|---|---|---|---|---|---|---|
T2VR models: | |||||||||
W2VV [1] | 0.5 | 2.9 | 4.7 | 24.5 | 32.6 | ||||
VSE++ [6] | 0.8 | 3.9 | 7.2 | 31.7 | 43.6 | ||||
W2VV++ [5] | 0.9 | 3.5 | 6.6 | 34.3 | 45.3 | ||||
HGR [2] | 1.2 | 3.8 | 7.3 | 33.4 | 45.7 | ||||
CE [4] | 1.3 | 4.5 | 7.3 | 36.0 | 49.1 | ||||
DE [7] | 1.5 | 5.7 | 9.5 | 36.9 | 53.7 | ||||
DE++ [8] | 1.7 | 5.6 | 9.6 | 37.1 | 54.1 | ||||
RIVRL [9] | 1.6 | 5.6 | 9.4 | 37.7 | 54.3 | ||||
HTM [3] | 1.2 | 5.4 | 9.2 | 44.2 | 60.0 | ||||
VCMR models w/o moment localization: | |||||||||
XML [10] | 1.2 | 5.4 | 10.0 | 45.6 | 62.3 | ||||
ReLoCLNet [11] | 1.6 | 6.0 | 10.1 | 46.9 | 64.6 | ||||
MS-SL | 1.8 | 7.1 | 11.8 | 47.7 | 68.4 |
Model | Paper | Code | Year |
---|---|---|---|
DL-DKD | Dual Learning with Dynamic Knowledge Distillation for Partially Relevant Video Retrieval | 2023 | |
MS-SL | Partially Relevant Video Retrieval | 2022 |