|
Sr. no | Reference | Approach | Contribution | Dataset | Model accuracy | Limitation |
|
1 | Wang et al. [18] | Convolutional neural network | Spatiotemporal model (3D ConvNet). Novel training strategy (AltFreezing) | FaceForensics++ [19], Celeb-DF (V2) [20] | 99% | Data augmentations/Model evaluation |
2 | Liu et al. [21] | Neural network | Generalized residual federated learning method | FaceForensics++ [19], some YouTube data | 99.7% | Privacy protection/Data privacy |
3 | Tyagi and Yadav [1] | Survey | Deep learning, visual imagery forgery detection | Self-collected data | NA | Generalized methods |
4 | Ganguly et al. [2] | Deep learning model | Soft attention mechanism, visual attention | FaceForensics++ [19], Celeb-DF (V2) [20] | 70.1% | Low accuracy |
5 | Kumar et al. [3] | Convolutional neural network | Extract deep features, distance of correlation coefficient | VIFFD [22], surrey university library for forensic analysis (SULFA) [23] | 86.5% and 92% for video level/99.9% frame level | Limited mention of false positives/information about methodology |
6 | Tan et al. [4] | 2D-convolutional neural network | Bidirectional long-short-term-memory | SYSU-OBJFORG [24] | 99% | No comparison with emerging deep learning architectures |
7 | Zhou et al. [5] | Watermarking network | Robust watermarking network for video forgery detection (RWVFD) tampering localization, 3d-unet-based watermarking embedding network | Davis [25], YouTube-VOS | NA | Limited scope of video types |
8 | Kim et al. [6] | Convolutional neural network | Symmetrically overlapped motion residual | SULFA 14, REWIND18, SYSU-OBJFORG 15 | 98% | Diverse tampering should be considered |
9 | Wang et al. [7] | Discrete cosine transform-based forgery clue augmentation network (FCAN-DCT) | Compact features extraction (CFE), frequency temporal attention (FTA) | Wild-Deepfake [26], Celeb-DF [20] | 86%, 99% | Lack of current real-world exploration |
10 | Munawar and Noreen [8] | Siamese-based RNN, I3D (inflated 3 dimension) | Siamese based RNN integrated with I3D to find the duplicate frame rate | Media forensic challenge (MFC) [27], video and image retrieval and analysis tool (VIRAT) [28] | 93.3%, 86.6% | Transfer learning not explored |
11 | Alsakar et al. [9] | SVD (single value decomposition), inter-frame forgery | First phase is 3D-Tensor decomposition, second phase is forgery detection, third phase is forgery locating | Randomly selected eight videos | 99% | Enhance detection and location for variety |
12 | Jin et al. [10] | ResNet50 model, LSTM-EnDec, DMAC, noiseprint | Object based video forgery detection, multi features fusion, dual stream | GRIP [29], VTD (video tampering dataset) [30], SULFA [23], REWIND [31] | NA | Limited evaluation of real-world scenarios |
13 | Fadl et al. [32] | 2D-CNN, SSIM, gaussion RBF multiclass support vector machine (RBF-MSVM) | Passive forensics, CNN (convolutional neural network), SSIM, spatiotemporal features, inter-frame forgeries | SULFA [23] | 99.9% | Detecting multiple forgeries in videos |
14 | Zheng et al. [33] | Spatiotemporal convolutional, | 2D R50 network structure, 3D R50 network structure, 3D R50-FTCN (fully temporal convolutional network) | Deepfake [34], FaceSwap, Face2Face | 99% | Limited real-world application evaluation |
15 | Huang et al. [35] | Cross-model authentication | Localization on live surveillance videos | Run time evaluation | 95.1% | Hardware and environment scalability |
16 | Verde et al. [36] | Convolutional neural network (CNN) | Focal: Forgery localization framework based on video coding self-consistency | 60 encoded videos | 88.9% | Assess scalability, improve model fusion |
17 | Kaur and Jindal [37] | Deep convolutional neural network (DCNN) | ANN (artificial neural network), convolutional layer, ReLU activation layer, max pool layer, correlation classification | REWIND [31], GRIP [29] | 98% | Consideration of hardware constraints |
18 | Zhong et al. [38] | Interframe best match algorithm | A unified moment framework, 9-digit dense, moment feature index, best match algorithm | REWIND [31], SULFA [23] | 75% | Real-world scenario evaluation |
19 | Sasikumar et al. [39] | SIFT, MSCL, clustering | SIFT (scalar invariant features transformer), MSCL (mean shift clustering algorithm), camera motion, feature extraction, classification, segmentation, in-painting | Randomly collected data | NA | Enhance video duplicate detection security |
20 | Aloraini et al. [40] | Sequential and patch analysis | Patch analysis, sequential analysis, object removal video forgery, spatiotemporal analysis | SULFA [23], SYSU-OBJFORG [24] | 72% | Nonadditive models are not explored |
21 | Hau Nguyen et al. [41] | Convolutional neural network (CNN) | Video interframe forgery detection, video authenticity, passive forensic | VFDD [42] | 99% | CNN needs to be simplified for diverse forgery |
22 | Parveen et al. [43] | Clustering algorithm | K-means clustering, radix sort | Randomly collected data | NA | Limited focus on clustering algorithms |
23 | Hosler et al. [44] | Convolutional neural network (CNN) | Benchmark testing, video signal processing | ACID [45] | 95% | Algorithm benchmark evaluations required |
24 | Fayyaz et al. [46] | Sensor pattern noise | Video forensics, digital forgery, sensor pattern noise, photo response nonuniformity noise (PRNU) | Dresden [47] | Not mention | Vulnerability to induced SPN attacks |
25 | Joshi and Jain [48] | Video tempering detection | Temporal fingerprints, optical flow | 200 video clip | 87.5% | Implement machine learning for classification |
26 | Chen et al. [49] | Scale-invariant feature transform | Invariant moment, region growing | Copy-move forgery detection (CoMoFoD) [50] | 84.6% | Reduce keypoints, optimize region growing |
27 | Pavlović et al. [51] | Multifractal spectrum and statistic parameters | New metaheuristic and supervised learning method | CoMoFoD [50] | 96% | Explore metaheuristics and multifractals further |
28 | Liu et al. [52] | Scale-invariant feature transform | K-means clustering | Randomly collected data | 89% | Optimize parameters and explore new technologies |
29 | Yadav and Salmani [17] | Survey | Machine learning, deep learning, generative adversarial network, neural network | Self-collected data | NA | Limited theoretical explanation |
30 | Jia et al. [53] | Optical flow consistency | Coarse-to-fine detection, video passive forensic | Randomly collected data | Not mention | Enhance handling of static scenes |
31 | Singh and Singh [54] | Dual-clutch transmission (DCT) matrix | Region duplication, correlation coefficient, and coefficient of variation | Randomly collected data | 96.6% | Struggles with subtle intensity changes |
32 | Afchar et al. [55] | Deep learning approach | DeepFake, Face2Face | DeepFake [34] | 98% DeepFake 95% Face2Face | Limited theoretical explanation of results |
33 | Chen et al. [56] | Region based convolutional neural network | Region proposal network in faster R-CNN network | Cityscapes [57], KITTI [58], SIM10K | NA | Dependence on adversarial training techniques |
34 | Aneja et al. [59] | Convolutional neural network (CNN) | Recurrent neural network (RNN) powered by long-short-term-memory (LSTM) | MS COCO [60] | NA | Sequential limitations in LSTM models |
35 | Shou et al. [61] | Online detection of action start (ODAS) | Generative adversarial network, evaluation protocol | THUMOS’14 [62], activity net | NA | Limited practical application and evaluation |
36 | Nguyen et al. [63] | Convolutional neural network | Capsule network, face swap detection, facial reenactment detection | REPLAY-ATTACK [64], FaceForensics [19] | 99% | Enhance resistance to adversarial attacks |
37 | Ulutas et al. [65] | Bag-of-words (BoW) | Scale independent features transform (SIFT) | Surrey university library for forensic analysis (SULFA) [23] | 97.5% | Limited focus on real-world scenarios |
38 | Zhao et al. [66] | Passive blind scheme | Hue-saturation-value (HSV), speeded up robust features (SURF), fast library for approximate nearest neighbors (FLANN) | 10 test shots | 99.01% | Limited to interframe forgeries |
39 | Voronin et al. [67] | Convolutional neural network (CNN) | Spatial-temporal procedure based on statistical analysis and CNN | 3000 videos | 96% | Future real-time application and comparisons |
40 | Carreira and Zisserman [68] | Inflated 3 dimension | Two stream inflated 3D ConvNet (I3D) based on 2D ConvNet | HMDB-51, UCF-101 | 80.2% HMDB-51, 97.9% UCF-101 | Use kinetics for comprehensive experiments |
41 | D’Amiano et al. [69] | Dense field algorithm | 3D PatchMatch based dense field algorithm | REWIND [31] | NA | Enhance video analysis |
42 | D’Avino et al. [70] | Recurrent neural network | Recursive network, long short-term memory | Randomly collected data | NA | Limited theoretical explanation |
43 | Cozzolino et al. [71] | Convolutional neural network (CNN) | Local descriptors, bag-of-words | Synthetic [72] | 94% | Explore architectural improvements for deep learning |
44 | Bozkurt et al. [73] | Discrete cosine transform (DCT) | Correlation image generation, coarser forgery line detection, finer forgery line localization | Randomly collected data | 98% | Not mention |
45 | Do et al. [74] | Deep convolutional neural network (DCNN) | Generative adversarial network (GAN) | Celeb-DF [20] | 80% | Limited discussion of real-world scenarios |
46 | Long et al. [75] | Convolutional neural network | Convolutional 3D neural network (C3D), long short-term memory (LSTM) | 2394 videos, YFCC100M [76] | 98% | Improve frame dropping and LSTM |
47 | Su et al. [77] | Region duplication | Adaptive parameter-based fast compression tracking (AFCT) | Randomly collected data | 93.1% | Detect diverse video forgery types |
48 | Mizher et al. [78] | Spatio termporal attacks | Falsifying techniques, fingerprint framework, secure system | Self-collected data | Not mention | Neglects complex video inpainting methods |
49 | Zhu et al. [79] | Spatiotemporal features | Scale invariant features transformation (SIFT) | TRECVID [80], CC_WEB_VIDEO [81] | 99% | Limited evaluation of real-world scenarios |
50 | Barhoom et al. [82] | Physical random objects | Digital tampering, digital forensics | Randomly selected data | NA | Limited theoretical explanation |
51 | Abbasi Aghamaleki and Behrad [83] | Passive forensics | Extract appropriate quantization error rich | MPEGx codic [84] | 92.73% | Limited theoretical explanation |
52 | Mathai et al. [85] | Statistical moment features | Normalization cross-correlation | SULFA [23] | 88% | Limited accuracy in forgery detection |
53 | Rao and Ni [86] | Convolutional neural network | Spatial rich model, support vector classification | CASIA v1.0 [87], CASIA v2.0, DVMM [88] | 98%, 97.8%, 96% | Limited theoretical explanation |
54 | Rigoni et al. [89] | Video tempering detection | Quantization index modulation, watermarking | Randomly collected data | 96.5% | Limited theoretical explanation |
|