Applied Computational Intelligence and Soft Computing

Review Article

Forged Video Detection Using Deep Learning: A SLR

Table 6

Comparison of State of art approaches.


Sr. no	Reference	Approach	Contribution	Dataset	Model accuracy	Limitation

1	Wang et al. [18]	Convolutional neural network	Spatiotemporal model (3D ConvNet). Novel training strategy (AltFreezing)	FaceForensics++ [19], Celeb-DF (V2) [20]	99%	Data augmentations/Model evaluation
2	Liu et al. [21]	Neural network	Generalized residual federated learning method	FaceForensics++ [19], some YouTube data	99.7%	Privacy protection/Data privacy
3	Tyagi and Yadav [1]	Survey	Deep learning, visual imagery forgery detection	Self-collected data	NA	Generalized methods
4	Ganguly et al. [2]	Deep learning model	Soft attention mechanism, visual attention	FaceForensics++ [19], Celeb-DF (V2) [20]	70.1%	Low accuracy
5	Kumar et al. [3]	Convolutional neural network	Extract deep features, distance of correlation coefficient	VIFFD [22], surrey university library for forensic analysis (SULFA) [23]	86.5% and 92% for video level/99.9% frame level	Limited mention of false positives/information about methodology
6	Tan et al. [4]	2D-convolutional neural network	Bidirectional long-short-term-memory	SYSU-OBJFORG [24]	99%	No comparison with emerging deep learning architectures
7	Zhou et al. [5]	Watermarking network	Robust watermarking network for video forgery detection (RWVFD) tampering localization, 3d-unet-based watermarking embedding network	Davis [25], YouTube-VOS	NA	Limited scope of video types
8	Kim et al. [6]	Convolutional neural network	Symmetrically overlapped motion residual	SULFA 14, REWIND18, SYSU-OBJFORG 15	98%	Diverse tampering should be considered
9	Wang et al. [7]	Discrete cosine transform-based forgery clue augmentation network (FCAN-DCT)	Compact features extraction (CFE), frequency temporal attention (FTA)	Wild-Deepfake [26], Celeb-DF [20]	86%, 99%	Lack of current real-world exploration
10	Munawar and Noreen [8]	Siamese-based RNN, I3D (inflated 3 dimension)	Siamese based RNN integrated with I3D to find the duplicate frame rate	Media forensic challenge (MFC) [27], video and image retrieval and analysis tool (VIRAT) [28]	93.3%, 86.6%	Transfer learning not explored
11	Alsakar et al. [9]	SVD (single value decomposition), inter-frame forgery	First phase is 3D-Tensor decomposition, second phase is forgery detection, third phase is forgery locating	Randomly selected eight videos	99%	Enhance detection and location for variety
12	Jin et al. [10]	ResNet50 model, LSTM-EnDec, DMAC, noiseprint	Object based video forgery detection, multi features fusion, dual stream	GRIP [29], VTD (video tampering dataset) [30], SULFA [23], REWIND [31]	NA	Limited evaluation of real-world scenarios
13	Fadl et al. [32]	2D-CNN, SSIM, gaussion RBF multiclass support vector machine (RBF-MSVM)	Passive forensics, CNN (convolutional neural network), SSIM, spatiotemporal features, inter-frame forgeries	SULFA [23]	99.9%	Detecting multiple forgeries in videos
14	Zheng et al. [33]	Spatiotemporal convolutional,	2D R50 network structure, 3D R50 network structure, 3D R50-FTCN (fully temporal convolutional network)	Deepfake [34], FaceSwap, Face2Face	99%	Limited real-world application evaluation
15	Huang et al. [35]	Cross-model authentication	Localization on live surveillance videos	Run time evaluation	95.1%	Hardware and environment scalability
16	Verde et al. [36]	Convolutional neural network (CNN)	Focal: Forgery localization framework based on video coding self-consistency	60 encoded videos	88.9%	Assess scalability, improve model fusion
17	Kaur and Jindal [37]	Deep convolutional neural network (DCNN)	ANN (artificial neural network), convolutional layer, ReLU activation layer, max pool layer, correlation classification	REWIND [31], GRIP [29]	98%	Consideration of hardware constraints
18	Zhong et al. [38]	Interframe best match algorithm	A unified moment framework, 9-digit dense, moment feature index, best match algorithm	REWIND [31], SULFA [23]	75%	Real-world scenario evaluation
19	Sasikumar et al. [39]	SIFT, MSCL, clustering	SIFT (scalar invariant features transformer), MSCL (mean shift clustering algorithm), camera motion, feature extraction, classification, segmentation, in-painting	Randomly collected data	NA	Enhance video duplicate detection security
20	Aloraini et al. [40]	Sequential and patch analysis	Patch analysis, sequential analysis, object removal video forgery, spatiotemporal analysis	SULFA [23], SYSU-OBJFORG [24]	72%	Nonadditive models are not explored
21	Hau Nguyen et al. [41]	Convolutional neural network (CNN)	Video interframe forgery detection, video authenticity, passive forensic	VFDD [42]	99%	CNN needs to be simplified for diverse forgery
22	Parveen et al. [43]	Clustering algorithm	K-means clustering, radix sort	Randomly collected data	NA	Limited focus on clustering algorithms
23	Hosler et al. [44]	Convolutional neural network (CNN)	Benchmark testing, video signal processing	ACID [45]	95%	Algorithm benchmark evaluations required
24	Fayyaz et al. [46]	Sensor pattern noise	Video forensics, digital forgery, sensor pattern noise, photo response nonuniformity noise (PRNU)	Dresden [47]	Not mention	Vulnerability to induced SPN attacks
25	Joshi and Jain [48]	Video tempering detection	Temporal fingerprints, optical flow	200 video clip	87.5%	Implement machine learning for classification
26	Chen et al. [49]	Scale-invariant feature transform	Invariant moment, region growing	Copy-move forgery detection (CoMoFoD) [50]	84.6%	Reduce keypoints, optimize region growing
27	Pavlović et al. [51]	Multifractal spectrum and statistic parameters	New metaheuristic and supervised learning method	CoMoFoD [50]	96%	Explore metaheuristics and multifractals further
28	Liu et al. [52]	Scale-invariant feature transform	K-means clustering	Randomly collected data	89%	Optimize parameters and explore new technologies
29	Yadav and Salmani [17]	Survey	Machine learning, deep learning, generative adversarial network, neural network	Self-collected data	NA	Limited theoretical explanation
30	Jia et al. [53]	Optical flow consistency	Coarse-to-fine detection, video passive forensic	Randomly collected data	Not mention	Enhance handling of static scenes
31	Singh and Singh [54]	Dual-clutch transmission (DCT) matrix	Region duplication, correlation coefficient, and coefficient of variation	Randomly collected data	96.6%	Struggles with subtle intensity changes
32	Afchar et al. [55]	Deep learning approach	DeepFake, Face2Face	DeepFake [34]	98% DeepFake 95% Face2Face	Limited theoretical explanation of results
33	Chen et al. [56]	Region based convolutional neural network	Region proposal network in faster R-CNN network	Cityscapes [57], KITTI [58], SIM10K	NA	Dependence on adversarial training techniques
34	Aneja et al. [59]	Convolutional neural network (CNN)	Recurrent neural network (RNN) powered by long-short-term-memory (LSTM)	MS COCO [60]	NA	Sequential limitations in LSTM models
35	Shou et al. [61]	Online detection of action start (ODAS)	Generative adversarial network, evaluation protocol	THUMOS’14 [62], activity net	NA	Limited practical application and evaluation
36	Nguyen et al. [63]	Convolutional neural network	Capsule network, face swap detection, facial reenactment detection	REPLAY-ATTACK [64], FaceForensics [19]	99%	Enhance resistance to adversarial attacks
37	Ulutas et al. [65]	Bag-of-words (BoW)	Scale independent features transform (SIFT)	Surrey university library for forensic analysis (SULFA) [23]	97.5%	Limited focus on real-world scenarios
38	Zhao et al. [66]	Passive blind scheme	Hue-saturation-value (HSV), speeded up robust features (SURF), fast library for approximate nearest neighbors (FLANN)	10 test shots	99.01%	Limited to interframe forgeries
39	Voronin et al. [67]	Convolutional neural network (CNN)	Spatial-temporal procedure based on statistical analysis and CNN	3000 videos	96%	Future real-time application and comparisons
40	Carreira and Zisserman [68]	Inflated 3 dimension	Two stream inflated 3D ConvNet (I3D) based on 2D ConvNet	HMDB-51, UCF-101	80.2% HMDB-51, 97.9% UCF-101	Use kinetics for comprehensive experiments
41	D’Amiano et al. [69]	Dense field algorithm	3D PatchMatch based dense field algorithm	REWIND [31]	NA	Enhance video analysis
42	D’Avino et al. [70]	Recurrent neural network	Recursive network, long short-term memory	Randomly collected data	NA	Limited theoretical explanation
43	Cozzolino et al. [71]	Convolutional neural network (CNN)	Local descriptors, bag-of-words	Synthetic [72]	94%	Explore architectural improvements for deep learning
44	Bozkurt et al. [73]	Discrete cosine transform (DCT)	Correlation image generation, coarser forgery line detection, finer forgery line localization	Randomly collected data	98%	Not mention
45	Do et al. [74]	Deep convolutional neural network (DCNN)	Generative adversarial network (GAN)	Celeb-DF [20]	80%	Limited discussion of real-world scenarios
46	Long et al. [75]	Convolutional neural network	Convolutional 3D neural network (C3D), long short-term memory (LSTM)	2394 videos, YFCC100M [76]	98%	Improve frame dropping and LSTM
47	Su et al. [77]	Region duplication	Adaptive parameter-based fast compression tracking (AFCT)	Randomly collected data	93.1%	Detect diverse video forgery types
48	Mizher et al. [78]	Spatio termporal attacks	Falsifying techniques, fingerprint framework, secure system	Self-collected data	Not mention	Neglects complex video inpainting methods
49	Zhu et al. [79]	Spatiotemporal features	Scale invariant features transformation (SIFT)	TRECVID [80], CC_WEB_VIDEO [81]	99%	Limited evaluation of real-world scenarios
50	Barhoom et al. [82]	Physical random objects	Digital tampering, digital forensics	Randomly selected data	NA	Limited theoretical explanation
51	Abbasi Aghamaleki and Behrad [83]	Passive forensics	Extract appropriate quantization error rich	MPEGx codic [84]	92.73%	Limited theoretical explanation
52	Mathai et al. [85]	Statistical moment features	Normalization cross-correlation	SULFA [23]	88%	Limited accuracy in forgery detection
53	Rao and Ni [86]	Convolutional neural network	Spatial rich model, support vector classification	CASIA v1.0 [87], CASIA v2.0, DVMM [88]	98%, 97.8%, 96%	Limited theoretical explanation
54	Rigoni et al. [89]	Video tempering detection	Quantization index modulation, watermarking	Randomly collected data	96.5%	Limited theoretical explanation