Abstract

An intrusion detection system, often known as an IDS, is extremely important for preventing attacks on a network, violating network policies, and gaining unauthorized access to a network. The effectiveness of IDS is highly dependent on data preprocessing techniques and classification models used to enhance accuracy and reduce model training and testing time. For the purpose of anomaly identification, researchers have developed several machine learning and deep learning-based algorithms; nonetheless, accurate anomaly detection with low test and train times remains a challenge. Using a hybrid feature selection approach and a deep neural network- (DNN-) based classifier, the authors of this research suggest an enhanced intrusion detection system (IDS). In order to construct a subset of reduced and optimal features that may be used for classification, a hybrid feature selection model that consists of three methods, namely, chi square, ANOVA, and principal component analysis (PCA), is applied. These methods are referred to as “the big three.” On the NSL-KDD dataset, the suggested model receives training and is then evaluated. The proposed method was successful in achieving the following results: a reduction of input data by 40%, an average accuracy of 99.73%, a precision score of 99.75%, an F1 score of 99.72%, and an average training and testing time of 138% and 2.7 seconds, respectively. The findings of the experiments demonstrate that the proposed model is superior to the performance of the other comparison approaches.

1. Introduction

There has been a discernible increase in the volume of traffic on the network. On the other hand, the number of potential infiltration threats has grown and their level of sophistication has also improved. Communication that is reliant on networks is now susceptible to attacks from both the outside and the inside. It is quite difficult to check incoming traffic since there is a large volume of traffic and a high number of attacks, which also increases the amount of time and money spent on computing. For this purpose, researchers are motivated to design an intelligent detection system that uses less computational time than traditional methods but gives a high level of accuracy.

An IDS is widely used for the classification of network traffic to identify anomalies inside the network. IDS is the software that analyses real-time network traffic and reports any abnormal activity going over the network. IDS can be divided into two types of systems: signature and anomaly-based detection systems. Signature-based IDS uses predefined patterns and matches the incoming traffic with existing patterns. If a match is not found, it classifies it as an anomaly, otherwise a normal pattern. The signature-based method cannot detect unknown and new attacks, whereas the anomaly-based technique is intelligent enough to identify any unknown attack on a network. Researchers prefer anomaly-based intrusion systems to handle unknown and unauthorized access on the network. However, anomaly-based systems give low accuracy and a high-false-alarm rate while dealing with high-dimensional data [1]. Researchers have also proposed a hybrid approach that combines both signature and anomaly-based approaches to handle seen and unseen data. In the hybrid approach, the computational cost is very high and the system gives poor performance in terms of accuracy [24].

Multiple machine/deep learning methods [515] for detecting intrusion in networks have been proposed in recent years; however, data dimensionality remained one of the biggest problems in intrusion systems. Due to high-dimensional data, IDS suffers in performance and accuracy. One of the solutions to this issue is to cut down on the amount of input features and make use of only those features that are reliable and have a significant bearing on the category of the final result. The purpose of feature selection is to select an optimal feature subset less than the original dataset and provide an efficient system with better accuracy. In network classification, data can contain some irrelevant features that can increase system computational time and affect accuracy. Feature selection techniques help us remove irrelevant data. Feature selection is considered a vital step in preprocessing as it can affect the system performance if relevant features are not removed from the original dataset [16, 17].

Feature selection algorithms are categorized into filter-based and wrapper-based techniques. Wrapper methods provide the best relevant feature subset, but they cost more computation time, which degrades the system performance. Similarly, filter methods are computationally efficient, have fast processing speeds, and are less prone to overfitting [18]. With the rapid increase in network traffic, intrusion detection systems are facing data dimensionality and system complexity issues. Feature selection is becoming an important phase of preprocessing for network classification problems. Feature selection helps us to reduce and remove irrelevant and redundant features from the main dataset that have no impact on classification results. The feature selection method selects a subset from the original dataset using some criteria that contain the properties of the original dataset. According to Kantardzic [19], when features are reduced from large datasets using basic techniques, classification improves. The parameters are discussed as follows:

1.1. Less Computational Power

When a large dataset is reduced using feature selection techniques, it also reduces system computational power as less time is required to train and test the model on the reduced dataset.

1.2. Improved Detection Accuracy

During the feature selection process only, those features are removed which have very low or no impact on classification so removing noisy features helps improve model accuracy. There are two main techniques of the feature selection wrapper method and the filter method. Both techniques have different advantages and disadvantage as described in Table 1.

In this research paper, we propose a two-stage hybrid model. In the first stage, we applied filter-based feature selection techniques to reduce the dimensionality of input data. After getting the optimal feature subset, we have used the deep learning model (DNN) [2022] for classification and have achieved increased accuracy with less processing time.

In this section, we will take a look at some of the most recent accomplishments that have been made in the field of anomaly and intrusion detection. IDS is an essential component of a secure network because it monitors the traffic that occurs between all of the devices connected to the network. There has been a significant amount of study conducted in the academic literature on the subject of identifying anomalous patterns of behavior, and numerous machine learning, deep learning, and hybrid approaches have been employed [23]. An IDS is a type of security management system that monitors the traffic coming into and going out of a computer system in order to identify any harmful behavior that may be taking place over a network. These systems examine the information coming from all of the sources before sending it on to the network for further processing. There are multiple features used by these systems to detect intrusion. Intrusion detection and protection systems are divided into four major categories: network based, wireless based, network behavior analysis based, and host based [24].

The usage of deep belief networks, often known as DBN, is common in IDS. The DBN has the power to learn high-dimensional representations of data in addition to doing categorization in an effective and precise manner. In order to fine tune the DBN model for improved classification, only a very little amount of labelled data is required [25]. On the KDD 99 dataset, the performance of the DBN is evaluated and it demonstrates superiority to both the SVM and ANN classification models that are currently in use. Potluri and Diedrich came up with the idea for a DNN-based intrusion detection system that can classify attacks. According to the findings, the suggested model is more successful at identifying classes of DoS and probe objects but it is less successful at identifying classes of R2L and U2R. Because there was little data available for training purposes, the detection accuracies were inconsistent in R2L and U2R cases but were reliable in DoS and probe situations [26, 27]. Kim et al. [27] proposed an intrusion detection system that was based on deep neural networks (DNN). In hidden layers, the activation function that is used is called ReLU.

A lightweight deep learning model was proposed by Zeng et al. [28], which makes use of deep learning for the classification of encrypted traffic and the detection of intrusions. Due to the deep learning usage model, they were able to understand unseen traffic. Results prove that the proposed model is more reliable and accurate with a minimum use of resources. Similarly, in [29], a ML- and DL-based technique is proposed. As technology improves, the number of threats to networks is always changing. Because of this, not all public datasets have data on all types of threats and attacks. Due to the dynamic nature of attacks, models underperform against unseen and unpredicted data. Due to the unseen problem of models, a new approach is proposed which basically classifies the unseen and unpredictable attacks. The model is trained on the latest datasets containing almost all types of cyberattacks, which makes it highly scalable and hybrid in the DNN framework.

Intrusion detection systems need high accuracy and detection time to compete with modern cyberattacks. A scale-hybrid intrusion detection and alert system was presented by Vinayakumar et al. [30]. The framework enables real-time monitoring of network traffic and the notification of system administrators of potentially harmful activity on the network. It was stated that the system would provide a DNN architecture that is both effective and heterogeneous and that it would be able to manage and analyse huge volumes of data in real time. Several other datasets, such as NSL-KDD and KDD’99, were utilised in the evaluation of the architecture. The best F-measure for binary classification on NSL-KDD was 80.7%, and the best F-measure for multiclass classification was 76.5% [30, 31].

A DNN-based model for anomaly detection in software-defined networks has been proposed by Tang et al. [32]. The proposed model has one input layer, three hidden layers, and one output layer. All of these layers are concealed from view. The NSL-KDD dataset served as the basis for some experiments. Only six out of the total of forty-one features are put to any kind of practical use, and the subset of these six features came from an SDN environment. When applied to a binary classification task, the model demonstrated an accuracy of 75.75%. The BAT model for the intrusion detection system was proposed by Su et al. [33]. The bidirectional long-short-term memory (BLSTM) and attention mechanism are the two components that make up the BAT model. The model shows better accuracy on the NSL-KDD test dataset and requires 100 epochs to be trained. In multiclass problem, the BAT model shoes 3% and 4% higher accuracy than CNN and RNN, respectively. The few-shot learning-based method is presented by Yu & Bian [34] to increase the network-based security and allow efficient intrusion detection. The proposed model achieves highest accuracy of 92.34% in detecting abnormal network behaviors, and the model is evaluated on NSL-KDD and UNSW-NB15 datasets. The model is trained using 2% data and still achieves leading performance.

Ahmadi et al. [35] have proposed a hybrid approach to improve the efficiency of IDS. The subset received from these techniques is passed to the decision tree classifier for classification results. The proposed feature selection model returns 20 useful features out of 41 features of the NSL-KDD dataset. The highest accuracy achieved by the classification model is 80.6. A novel feature selection and classification model is proposed by Ahmadi et al. [36]. The feature selection model uses chi square, information gains, and correlation-based techniques which are used with majority voting. The majority voting model return optimal features which are passed to the decision tree for classification purposes. The proposed model achieved around 80% accuracy whereas a total of 20 features are used from the NSL-KDD dataset. Similarly, the GAN-based feature selection and oversample handling scheme is proposed [37]. Dimensionality reduction and oversampling are one of the core issues in classification problem especially intrusion detection systems. Results show that the proposed model returns better features and enhances the classification model’s performance. Feature selection is considered one of the main parts of IDS because these systems have to deal with a large amount of data so a strong feature reduction technique is always encouraged to be applied with the network classification problem. Researchers [34, 3652] have used different feature selection techniques. The gain ratio, Pearson correlation, and ANOVA are few of the techniques that are widely used. Feature selection helps us to reduce the input data size by removing redundant and irrelevant features and features with no impact on classification [5362].

3. Methodology

In this section, the overall methodology of the article will be presented. The methodology of the paper is divided into two stages. The first phase is known as data preprocessing, and it includes processes such as data normalization, data encoding, and feature selection. The second phase is known as the deep neural network model, and it is responsible for getting the preprocessed data and classifying the traffic as either normal or abnormal. The block schematic of the suggested model can be seen in Figure 1.

3.1. Data Preprocessing

The purpose of data preprocessing is to optimize the information collection and processing by making adjustments to the values of the data in a particular dataset. Because there is usually a significant difference between the dataset’s maximum and minimum values, normalizing the data reduces the algorithm’s complexity. According to Chiba et al., the results of classification can be improved with proper data preprocessing specially in deep learning [63].

3.1.1. Data Normalization

The NSL-KDD dataset contains both discrete and continuous features, the same as KDD99 [64]. Difference in feature values makes features more diverse and contrasting. So, the preprocessing phase is required to normalize the data and scale all feature values into the same range. Features are normalized using mean and standard deviation to make the same value range. Equation (1) describes the mean algorithm used for feature scaling.

Here, the mean is an arithmetic mean. is the total no. of rows in a single column that are being averaged. is the individual averaged value; we use standard deviation to handle the data dispersion. The dataset contains multiple features with widespread values for which deviation is required. The formula of standard deviation used in paper is given in equation (2). where is the th point in the dataset, is the mean value of the dataset, and is the total data points in the dataset.

3.2. Feature Selection Techniques

A combination of three filter-based feature selection techniques is used in the feature selection model. The most relevant features were ranked and used for classification. The most important features that have a strong influence on the output class are prioritized and chosen by the model to classify the network traffic as normal or anomaly. Chi square, ANOVA, and principal component analysis (PCA) are used for feature selection. The results of all three techniques are combined as a single subset with a threshold value more than one. A feature which is repeating in any of the two subsets was used for the final subset. We combined the results of the multiple feature selection technique as it helps to find the most relevant and strong features and improves classification accuracy [38]. Figure 2 describes the complete feature selection model proposed and used in the paper.

3.2.1. Chi Square

Chi square is a statistical approach widely used for feature selection. It finds the importance of each individual feature with respect to the outcome class. The chi square value is used to determine the dependence of features on the outcome class. In other words, if a feature has a higher chi square value, it is more dependent on the outcome class and is suitable for classification. The mathematical representation of the chi square technique is given in equation (3). where is the total number of attributes, is the total number of classes, and and are the actual and predicted values. The higher the value of chi square (), the more the importance of features for the prediction model.

3.2.2. ANOVA (Analysis of Variance)

ANOVA is a univariate feature selection technique that ranks the features according to their variance score. The variance score of features determines its impact on the response class. High variance between features of multiple classes reflects that better classification can be done, whereas low variance leads toward poor classification.

3.2.3. Principal Component Analysis (PCA)

The PCA is a highly known method used for the reduction of data. PCA utilizes the linear algebra in order to minimize the dimensionality of the data while maintaining its fundamental nature and useful characteristics. Less information is lost when PCA is applied for feature reduction. It is also less sensitive towards noisy data.

3.3. Deep Neural Network Model for Classification

After data cleaning and reduction, the deep neural network is used for classification purposes. The deep neural network is widely used in intrusion and anomaly-based applications. DNN models are divided into input, hidden, and output layers. The DNN optimizes parameters to avoid the classification errors during training time. Complex hidden layer structures make DNN models more accurate and flexible to handle large datasets. Each layer gains a distinct complexity level for all features. The proposed DNN model contains three hidden layers with the rectified linear unit (ReLU) as activation function in the hidden layer and sigmoid in the output layer. Our proposed DNN model contains three hidden layers where Adam is used as the optimizer. The general DNN model is shown in Figure 3.

3.4. Algorithm

Proposed algorithm is shown in Algorithm 1. After the input algorithm starts from step 1 where data normalization and feature encodings are done, initially, all feature values range different so it is required to normalize all feature values into a same scale. As the dataset contains both numeric and textual features, so, it is a must to convert them into the same format before we pass them to the classification model. Step 2 of the algorithm is feature reduction where preprocessed features are passed to three different feature selection models which return three different feature sets. From these three feature sets, only those features are shortlisted for the classification model which are selected by any two or all selection models. Feature selection models used in this study are chi square, ANOVA, and PCA. At the end of step 2, we get a single feature subset which is reduced from the original dataset. Step 3 is basically classification; the DNN is used for classification purposes. The final subset of preprocessed, and the selected features are passed to the DNN model for classification.

1 Input: NSL-KDDTrain++2 Output: Accuracy, Precision, Recall, F1- Score;
3 Initialization:
4  f = features, nfeatures = Numeric features, tfeatures= textual features, f_c =
features from chi-square, f_a= features from ANOVA, f_p= Features from,
PCA, f_n= Final Features subset, x= number of times a feature repeat in any
three subsets (f_c, f_a, f_p)
5 Step 1: Data Preprocessing
6  f'=MinMaxnormalization(f)
7  nfeatures'=encodenumericzscor(nfeatures)
8  tfeatures'=encodetextdummy(tfeatures)
9  EndStep
10 Step 2: Features Selection
11   f_c=Chi-Square(f)
12   f_a=ANOVA(f)
13   f_p =PCA(f)
14 EndStep
15 Step 3: Classification
17  Model is trained and tested on NSLKDD Binary Classification Dataset.
18   Relu is used in input and Hidden layers while Sigmoid in Output layer
19  EndStep
20  Return the classification result.
3.5. Dataset Description

Data from the NSL-KDD dataset is used to develop and test the model under consideration. In anomaly detection, the NSL-KDD dataset is a well-known and benchmark dataset. It is an updated version of the KDD99 dataset. In NSL-KDD, duplicate entries were removed and class imbalance was also improved as compared to that in KDD99 which contains more than 50% duplicate entries due to which its model was overfit most of the time. NSL-KDD containing 41 features with 2 labels (binary classification) is used in our work. The KDDTrain + binary classification dataset from NSL-KDD is used for training and testing purposes. The dataset contains 125974 unique rows. As shown in Figure 4, the dataset contains a balanced binary class.

4. Experimental Results and Discussion

This section presents the proposed model results. Methodology consists of three steps including preprocessing, feature selection and classification, therefore, multiple experimental settings have been used and results are presented by varying feature selection and classification methods. In feature selection, input data is reduced to 40% that helps to improve model performance as the removed features are considered noisy and irrelevant. The classification result also shows better performance than the existing detection models. Results of feature selection and classification are discussed as follows.

4.1. Feature Selection Results

After the data normalization, the next step is to reduce data dimensionality. Three well-known feature selection techniques are used for feature reduction. These techniques return a subset of features from the main dataset. A feature repeating more than once in any of the three subsets was selected for final input for the classification model. Table 2 shows the features generated by all three techniques.

The proposed feature selection technique outperforms the other techniques in terms of accuracy as shown in Table 3. Feature set obtained as a result of our proposed feature selection method gives high classification accuracy, precision and recall with low computational complexity. The model achieved the highest accuracy of 99.73 using 27 out of 41 features.

4.2. Classification Results

The DNN model that was proposed achieved a greater level of accuracy, precision, recall, and F1 score than any of the previous papers while also requiring less computational time. According to the results presented in Table 4, our model had an accuracy of 99.73%, which was the greatest among the comparable research. Table 5 provides the confusion matrix that was created by applying the suggested model.

To examine the effectiveness of the proposed model, the results are compared with existing deep learning and machine learning approaches. Table 6 demonstrates that the proposed model outperforms other benchmark algorithms.

Figures 5 and 6 show the evaluation results of the proposed model compared with the th exiting deep learning and machine learning models. Our proposed model achieves higher accuracy, precision, recall, and F1 score than all compared techniques. Our model also takes less training and testing than the comparative techniques.

Figure 7 shows the comparison of our proposed model with machine learning models. Our proposed model shows better results than all comparative machine learning techniques. Machine learning models are found to be struggling against network data.

4.2.1. Computational Time

In addition to other performance metrics, computational time is also an important metric that can be used to check the efficiency of the system for real situations of network intrusion detection. The Table 7 shows the training and testing time of our model compared with some other method time. Proposed method reduces the dimensionality of data; therefore, less computational power is required to train the model. The significance of our model is the mitigation of overfitting effect due to the removal of redundant features using feature selection methods and reduction of time and computational complexity, whereas existing methods have only focused on the accuracy.

5. Conclusion and Future Work

The article presents a deep learning-based intrusion detection model. In the proposed scheme, the network data can be secured using the detection model and can save network data from all types of cyberattacks. The proposed model is the combination of feature selection classification techniques and achieves higher accuracy, precision, recall, and F1 score. The significance of our model is the mitigation of overfitting effect due to the removal of redundant features using feature selection methods and reduction of time and computational complexity, whereas existing methods have only focused on the accuracy. The proposed model also takes less time in training and testing than other comparative techniques. In future, this work can be applied to other datasets to check the performance. Similarly, we can also use a multiclass dataset to validate the model performance. For feature selection, we can use some other techniques which can improve the current model performance.

Data Availability

The data used in this research can be obtained from the corresponding authors upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors would like to thank Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2022R193), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.