Abstract

Detecting Alzheimer’s disease (AD) early on allows patients to take preventative measures before the onset of irreversible brain damage, which is a critical factor in the treatment of Alzheimer’s patients. Most machine detection methods are constrained by congenital observations, although computers have been utilized in several recent research studies to diagnose AD. In AD, the hippocampus is usually the first part of the brain to be affected. Structural magnetic resonance imaging (SMRI) can be used to assist in diagnosing AD by measuring the hippocampus’s form and volume (MRI). The information encoded by these attributes is restricted and may be affected by segmentation problems. These traits are also extracted independently of the classification, which could result in lower-than-desired classification accuracy. Researchers in this study used structural MRI data to develop a deep learning framework for combined automatic hippocampus segmentation and AD categorization. Multi-task deep learning (MTDL) is used to learn hippocampus segmentation simultaneously. The hyperparameter optimization of the CNN model (capsule network) for illness classification is then carried out using the deer hunting optimization (DHO) technique. ADNI-standardized MRI datasets have been used to test the suggested method, and it is accurate. Suggested MTDL achieved 97.1% accuracy and 93.5% of Dice coefficient, whereas the proposed MTDL model achieved an accuracy of 96% for binary classification and 93% for multi-class classification.

1. Introduction

Alzheimer’s disease is a brain ailment that gradually impairs thinking and memory abilities as well as the capacity to do even the most basic tasks. An intracellular protein called cAMP-response element binding protein (CREB) controls the expression of key genes in dopaminergic neuron [1]. The shared form of dementia, AD, poses a significant test to healthcare providers in the twenty-first century. In the United States, 5.5 million people who are 65 years and older have AD, making it the sixth greatest mortality [2]. In 2018, the total cost of controlling AD in the United States was $277 billion, with a significant impact on the broader economy and a strain on the country’s healthcare system [3]. In the absence of a treatment that has been proven to alter the course of the disease, a considerable deal of work has been put into developing procedures for early identification, particularly in presymptomatic phases [3]. Advances in neuroimaging techniques, including MRI and PET, have been utilized to discover AD-related structural and molecular biomarkers [4]. Brain imaging technology has progressed at an incredible rate, making it difficult to incorporate enormous amounts of high-dimensional multi-modal data. Computer-aided machine learning methodologies for integrative analysis have become increasingly popular as a result. AD progression can be predicted using well-known pattern analysis approaches such as LPBM, logistic regression, and support vector machine (SVM) [5].

Preprocessing or architectural design is required to use these machine learning techniques [6]. Dimensionality reduction is a common aspect in machine learning classification investigations, as is the extraction and selection of features, as well as the selection of classification methods based on features. These techniques necessitate a high level of specialized knowledge and may take a long time to optimize through numerous phases. A problem has arisen in the reproducibility of these methods [7, 8]. Neuroimaging modalities can be used to pick AD-related features in the feature selection process, brain glucose metabolism, and amyloid buildup in research regions (ROIs), such as the hippocampus, such as mean subcortical volumes, densities of grey matter, and cortical thickness [9, 10].

It is becoming more and more common for large-scale medical imaging analysis to use “on-the-fly” deep (or “on-the-fly”) learning to generate features from raw neuroimaging data [11]. Deep learning techniques for AD diagnosis are based on short MRI datasets, which makes it difficult for researchers to build deep CNN models with a significant number of parameters that must be learned [12, 13].

1.1. Problem Statement

Hippocampal analysis methods now in use have several flaws. First, precise segmentation of the hippocampus is required for both hippocampal volumetric and shape analyses. The hippocampus is difficult to correctly segment because of its irregular shape and unclear boundary in MRI. Handcrafted shapes may not be suitable for examination in the future, affecting categorization performance in the diagnosis of illnesses. According to a third study, the hippocampus alone may not be sufficient to distinguish mild cognitive impairment (MCI) patients from healthy controls. In the early stages of AD, both the amygdala and the para-hippocampus are also affected by the condition. As the last point, MRI images taken from the hippocampal region can be very helpful in the diagnosis of AD.

1.2. Contribution

Machine learning/deep learning algorithms have been used to detect biomarkers and interpret illness aetiology in recent years. Detecting AD can be done in a variety of ways, including analyzing MRI images for specific areas of interest (ROIs). The hippocampus is an essential anatomical region in the pathogenesis of AD since it is one of the first brain ROIs to be impacted. A new deep learning framework combining an MTDL model and an MTDL model for simultaneous hippocampus segmentation and illness organization using MRI data is suggested to address the aforementioned issues listed in the problem description.

Faisal and Kwon’s goal [14] was to design a deep learning system that could extract useful AD biomarkers from physical MRI and classify brain pictures into AD, MCI, and CN groups. In this study, researchers used ADNI datasets available online to train CNNs on MRI brain pictures. It was used to merge features from multiple into compact high-level features by using our proposed process. Using the proposed method, computation time is lowered because there are fewer variables to deal with. Comparative evaluations of our suggested convolution operation vs. the most extensively used AD classification metrics, such as accuracy and area under ROC curve (AUC), are performed.

Early detection of various phases of cognitive impairment and AD utilizing neuroimaging and transfer learning (TL) was the emphasis of Shanmugam et al. [15]. Images from ADNI’s database with varied CN, early mild cognitive impairment, moderate cognitive impairment, and late-MCI as LMCI classifications are classified using transfer learning. There are three pretrained networks utilized in this categorization that have been trained and evaluated on 6000 photos from the ADNI collection. Confusion matrices and their properties are used to evaluate the classification presentation of the three networks. GoogLeNet, AlexNet, and ResNet-18 all have an overall accuracy of 96.39%, 94.08%, and 97.51%, respectively, in detecting Alzheimer’s disease. Confusion matrix parameters were also used to examine the pretrained networks’ performance within classes.

There are numerous techniques to utilize deep learning classification to categorize Alzheimer’s disease, according to Samhan et al. [16]. In large trials, adopting this method will result in better patient care and lower costs. Python was used in the development of the system, which is particularly useful for doctors in the classification of AD. 70% of the image was used to train the model, and 30% was used to verify it. On a series of held-out tests, our trained model was 100% accurate.

As a potential tool for identifying people with AD-related dementia, Tian et al. [17] investigated the retina, specifically the retinal vascular system. Adding a saliency analysis on top of the high level of classification accuracy helps make this pipeline easier to understand. Saliency study shows that retinal images with small vessels provide more information for Alzheimer’s disease diagnosis than images with large vessels.

To classify this chronic condition as AD, Divya and ShanthaSelvaKumari [18] employed several feature selection strategies and distinct classifiers. When the number of records with large dimensions is few, it is much easier to classify those records. They yielded accuracy rates of 968.22%, 89.59%, and 90.40% after several attempts to pick the best features. SVM with radial basis function kernel yielded these higher accuracy rates. In the MCI/AD classification, a 2.7% improvement in the MMSE score was seen, but it had no impact on the NC/AD and NC/MCI classifications.

“The wisdom of experts” can be harnessed by using An et al.’s deep ensemble learning framework [19] to integrate multi-source data. Training two sparse autoencoders for feature learning at the voting layer helps to minimize the connection between characteristics and diversify the base classifiers. Classifiers are ranked using a deep belief network that uses a nonlinear feature-weighted algorithm at the stacking layer, which may violate conditional independence. As a sort of meta-classifier, the neural network is employed. To deal with a cost-sensitive issue, oversampling and threshold shifting are employed at the optimization layer. An ensemble of probabilistic predictions is combined with a similarity computation to produce optimized forecasts. Alzheimer’s illness is classified using the new deep ensemble learning framework. Our proposed framework outperforms six well-known ensemble techniques, including the classic stacking algorithm, in classification accuracy tests using clinical data.

Densely linked convolutional neural networks with connection-wise attention mechanisms were proposed by Zhang et al. [20] to learn the properties of brain MR images for AD classification. Pictures are preprocessed using a dense CNN, which extracts multi-scale features, and a connection-wise attention mechanism is utilized to integrate connections among features from diverse layers to turn the MR images into more compact high-level features. MRI’s spatial information can be captured by extending the convolution operation to 3D. All of the previous layers’ features were combined with those from the 3D convolution layer in various ways before being used to classify the data. Based on baseline MRI scans of 968 ADNI database participants, The authors tested the technique to distinguish between AD and healthy patients, MCI converters and healthy subjects, and MCI using MCI scans.

2.1. Challenges in Brain MRI Segmentation
(i)Brain structural structures differ greatly among individuals due to genetics, age, gender, and illness. Using a single segmentation algorithm across all phenotypic subgroups is problematic.(ii)For example, it is difficult to deal with cytoarchitectural changes such as the thickness of tissue, the depth of the sulci, and smooth boundaries between tissue types. This might lead to a muddled categorization of various tissue types. Even human professionals have difficulty with this.(iii)These modalities have a low contrast of anatomical structure, which leads to poor segmentation performance.(iv)Manual segmentation is tedious and subjective and requires a deep understanding of brain anatomy to perform. Thus, it is challenging to acquire sufficient data for creating a segmentation model.(v)In an ordinary image for segmentation, the noisy backdrop makes it difficult to apply an appropriate label to each pixel/voxel with learned characteristics.(vi)In addition to its tiny size and volume, the hippocampus is one of the most important biomarkers for AD because of its structural heterogeneity, partial volume effects and low contrast, and low signal-to-noise ratio.

3. Proposed Model

One of the ways to diagnose AD is represented in Figure 1. The MRI slices must be obtained initially. Preprocessing removes irrelevant information from the data and reorients them so that they can be interpreted more easily. The preprocessed data are segmented using deep learning to retrieve the properties from the brain MRI. For example, a classifier uses parameters like the patient’s body surface area, the center of gravity, intensity, and standard deviation to determine whether he or she is developing AD or not.

3.1. Dataset Description

MCI and early-onset Alzheimer’s disease can be tracked using MRI, PET scans, and other biomarkers as part of the Adverse Childhood Neuropsychiatric Disorders Initiative (ADNI). Written informed consent for the collection of imaging and genetic samples was signed by the subjects at the time of enrolment and approved by the Institutional Review Boards (IRBs) at each participating location.

A total of 449 participants were randomly selected to participate in the study. MMSE stands for Mini-Mental State Examination, and CDR stands for Clinical Dementia Rating. For 1.5 T MR imaging, we used images obtained by the ADNI acquisition method [21]. Image acquisition procedures are explained in greater detail on the ADNI website. Images are resized to the dimension of 11 cubic mm to fit on a single sheet of paper. As a result of this treatment, their skulls were scraped and their cerebellums were removed. The FMRIB Software Library (FSL) 5.0 from https://fsl.fmrib.ox.ac.uk/ was utilized in this project and used a template image with 12 degrees of freedom and a set of evasion parameters to align all MR pictures.

ADNI participants’ demographic and clinical information is shown in Table 1 (mean standard deviation). AD, mild cognitive impairment, and normal control are all referred to as “AD,” “MCI,” and “NC,” respectively.

3.2. Preprocessing

Nonlinear gradients in a picture can distort an image using a method called Gradwarp [22]. Gradient models have a different kind of nonlinearity. The geometrical features of an image can be tweaked to improve its information. B1 nonuniformity is used to rectify image color and intensity information because of mishandled radio frequency transmission. N3 bias field correction corrects the distortion caused by dielectric effects during acquisition [23]. Although N3 bias field correction is used for 1.5 T images to improve the nonuniform gradient in the image, these effects are widespread in 3 T machines. Before the N3 correction, Gradwarp and B1 corrections have been applied.

Image segmentation has been used in the literature to improve classification accuracy [24, 25]. These images were preprocessed with the use of the segmentation module of statistical parametric mapping (SPM) available at http://www.fiiion/ion-ucl/spm. To map MRI scans onto tissue probability maps, SPM uses these maps to extract the mapped regions. The MRI scan is segmented into three parts using bias correction and normalization in this module. The output of mapping can be linked to the orientation of a picture using a process called registration. Brain registration is all about minimizing the impact of external elements like the scalp on the segmented pictures of the cerebral cortex that are generated.

3.3. Segmentation Using MTDL for Joint hippocampus

In the human brain, there is a small area known as the hippocampus in the medial temporal lobe. The hippocampus contains a disproportionately small number of voxels compared to the rest of the brain, resulting in a very unbalanced dataset. After preprocessing and registration, the next step is to create 3D image patches with hippocampus-specific bounding cubes. The 3D axes of the bounding cubes are used to extract 3D patches from MR images. It is important to consider the size of the bounding cube when determining how the hippocampus is segmented. A large bounding cube may also lead to the class imbalance problem, increasing the computation time. Small bounding cubes can impede the segmentation of the hippocampus. An empirical study found that a voxel bounding cube of 64 × 48 × 64 voxels was the optimal size for the trade-off. The patches form the basis of our deep learning model for segmenting the hippocampus and classifying illnesses.

Jointly learning hippocampus segmentation and an illness classification is a novel approach that differs from standard methods in which these two tasks are performed separately. To classify images and identify objects, researchers frequently utilize CNNs. V-Net, a volumetric and complete CNN for prostate segmentation in MRIs, has been proposed. This is a multi-task deep CNN model for joint hippocampus segmentation and illness classification inspired by the success of V-Net in prostate segmentation.

Residual functions are learned at convolutional stages using a deep CNN, which aims to achieve fast convergence. “ResNet Block 1” and “ResNet Block 2” are two residual blocks, each consisting of 3D convolution, batch normalization (BN), parametric rectified linear unit (PReLU) activation, and dropout layers, as illustrated in Figure 2. The input is added to the output of the second convolutional layer to learn a residual function in ResNet Block 1. For each block, the input is added to the outputs of both convolutional layers for a residual function, which is learned in Block 2 of ResNet. There are batches of MRI data that are used to train the kernels. Fast inference is easier to achieve with small kernels since there are fewer parameters to train. More complicated patterns and greater expressiveness can be learned by larger kernels. Layers of tiny kernels can be stacked to generate this appearance. For all convolutions, the kernel size is fixed at 3 × 3 × 3. A nonlinear PReLU activation is used to activate the learned filters, and a feature map is then constructed for each one.

Downsampling is used to minimize the size of feature maps and improve the receptive field of features in the following layers during the compression stage. Using convolution with kernels of size 2 × 2 × 2 and stride 2, it is implemented. A volumetric segmentation mask is generated by expanding the spatial support of the lower resolution feature maps during the decompression step. The 222 and stride 2 kernels are used for the upsampling via deconvolution. For the probabilistic segmentation of the hippocampus regions, the outputs are transformed to voxel-wise softmax by applying a convolutional layer with a 111 kernel and stride 1. As the last step, the probability output is converted into a binary mask by setting the threshold to 0.

Optimizing the Dice loss function, which measures how well our model can separate hippocampus voxels from the background, is the goal for subject m’s hippocampal segmentation:

If the numerator is zero, a little number is denoted to avoid the numerator from being zero. This is done by using the segmentation prediction (pi) and the ground truth label (qi). If the number of foregrounds and background voxels is sufficiently unbalanced, the Dice loss function can be utilized. Fully linked layers are used as decompression components to increase classification accuracy. Comparing the predicted and actual labels for subject m, we utilize the categorical cross-entropy loss.

The multitasking deep loss function segmentation loss and classification loss are added together to create the CNN model:

The total number of subjects is M; and are the ground truth label and the anticipated label for subject m, respectively. Losses in hippocampus segmentation and illness classification training are taken into account by weighting the parameter a = [0, 1]. Classification is more critical than segmentation in the early stages of training for a multi-task deep CNN model. Initial warm-up emphasizes segmentation by setting a value of 1 for a. After that, it goes down to 0.5 for training in multi-tasking. Finally, is set to 0 so that the classification process can take precedence. The Adam approach is utilized to jointly optimize the multi-task network model, and a backpropagation algorithm is used to calculate the network gradients.

After correcting the hippocampal segmentation findings, the hippocampal image patches are shown. Before and after manual corrections, the mean, standard deviation, and range of hippocampus volumes are shown for several groups of participants. After adjustment, we can see that the mean and SD of hippocampus volume have decreased. For AD, MCI, and NC patients, Figure 3 depicts the scatterplots [21, 22].

3.4. Classification

Sabour et al. [25] were able to overcome the limitations of CNN by employing a higher-dimensional vector known as a “capsule” to represent an entity rather than an individual neuron. The properties of a specific entity portrayed in an image are reflected in the neuronal activity of the active capsule. These features, including the likelihood and a set of parameters such as albedo (color), hue (texture), or deformation (deformation), are taught to a capsule for each visual item. An entity’s attributes and the likelihood of existence are represented in CapsNet’s input and output as vectors with direction and norm. The model is used to improve forecasts of AD by predicting a high-level capsule’s instantiation parameter over a conversion matrix by employing similar levels of capsules. The natural logarithm base, e, is used to define the spiral shape as a constant. To evaluate it, one may use

It is used to save the best solutions and boost the position of a separate search agent, for example, using . The DHO presented here begins with a random sample of the population. The search agent might move closer or farther away from the ideal search agent as it iterates. To ensure that the shift from exploitation to exploration goes well and it is in control, the DHO becomes a global optimizer when it has a strong exploitation and exploration capacity.

3.4.1. DHO-Based Hyperparameter Tuning

A new metaheuristic DHO approach based on deer hunting was developed by a group of hunters for the tuning of hyperparameters. Hunters employ a variety of strategies to surround and approach the deer as closely as possible when hunting it. Deer position and wind angle have to be taken into account when using this technique. Another crucial element of successful hunting is a sense of camaraderie among the participants. Following their successor and leader, their final goal is achieved. The graphic below depicts the model’s goal function:

When it comes to weight loss, the DHO method relies on the deer’s unique abilities to elude hunters. A haphazard gathering of hunting vectors catalyzes the process. It is described using the following equation:

There are two ways to express how much population (or “weight”) a hunter has when optimizing his strategy. Next, important elements like weight, position, and wind angle are used. Because the entire search area is considered a circle, it is possible to define the wind angle as the circle’s diameter.where stands for the arbitrary value within the range = [0, 1] and J stands for the current iteration. The location propagation for optimization with the leader position () and succeeding position (Xs) is provided. The placement of the following weights is determined by the successor location, whereas the primary location of the hunter is determined by the leader location.

is used to spread the message. Everyone tries to reach the optimal location after establishing an optimal location. To begin updating the location, we simulate the surrounding behavior as shown below:

The current iteration’s location is designated as , whereas the location for the next iteration is designated as . This process is aided by the Z and K coefficient vectors. If wind speed is taken into account, an arbitrary value of p can be generated, and this number ranges from 0 to 2. The Z and K coefficient vectors can be estimated using the expressions below:where jmax is the maximum iteration. In addition to the range [0, 1], the value of the b variable ranges from −1 to 1. (X, Y) is the initial location of the hunter, which gets upgraded based on the location of the prey. X b and Y b are recalculated using the Z and K coefficient vectors. When the value of p is less than 1, a position update procedure takes place that allows the hunter to move in any direction without regard to the angle. Transmission utilizies a slanted inclination. Search space is expected to expand as a result of the angle location updation. The angle of the hunter’s position is critical to the success of the hunting strategy. To put this into action, consider

The ideal position can be shown as B = (j + 1), X (b j), and p, where p signifies the arbitrary values. The angle location is opposite to the individual location, so the prey does not have any sense of the hunter’s presence via the successor location. The vector K is shown within the encircling behavior in the exploration. K values are first considered to be less than 1 to perform an arbitrary search. As a final point, a successor location is used instead of the best possible location in the location updating method. As a final step, a worldwide search is conducted.

Site updates are carried out so that an ideal location can be found (namely, termination condition). By optimizing the weight parameters of the pretrained CNN model, it is effectively used to identify whether the patient is AD or normal, and the multi-class classification of AD is also performed.

4. Results and Discussion

To create the segmentation and classification model, a high-level neural networks API with Tensorflow as the backend was employed. Keras was used because of its ease of use and ability to run on a GPU.

4.1. Evaluation Metrics

Our method’s segmentation and classification performance is assessed using the challenge evaluation measures such as accuracy (AC), Jaccard index (JSI), and Dice coefficient (DSC) in segmentation analysis. AC, specificity (SP), and sensitivity (SE) are all part of the classification’s evaluation process. The criteria for evaluating performance are laid forth as follows:where and denote the number of true positive, true negative, false positive, and false negative.

4.2. Comparative Analysis for Proposed Segmentation (MTDL)

In this section, the proposed model is compared with existing techniques such as fuzzy c-means (FCM), adaptively regularized kernel FCM (ARKFCM), and fast and robust FCM (FRFCM). Table 2 and Figure 4 provide the experimental analysis for MTDL with existing models [26].

In Table 2, the analysis represents the validation results for different segmentation techniques. In the first method, FCM achieved an accuracy of 84.8% and the next FRFCM achieved an accuracy of 92.6%, and this accuracy performance is better than FCM. ARKFCM reached the accuracy percentage of 96.5%. Finally, the proposed MTDL reached a better accuracy of 97.1% and achieved better performance than other methods. In the analysis of DSC, FCM achieved 82.1%, FRFCM achieved 91%, ARKFCM achieved 92% and the proposed model achieved 93.5%. Finally, JSI is high for MTDL (i.e., 87.8%) compared to existing FCM models (69% to 85%).

4.3. Comparative Analysis of Proposed Classification

Two types of analysis such as binary classification (normal or AD) and multi-class classification (AD/MCI/NC) are carried out, where Table 3 and Figure 5 show the experimental analysis of the proposed classifier with existing techniques. For better performance, all techniques are implemented with DHO.

Table 3 represents the comparative analysis of binary classification of different models such as RNN, recurrent neural network, and CapsNet. The classifier model of the recursive neural network reached a sensitivity of 93.00% and an accuracy of 90.00%. The recurrent neural network model reaches an accuracy of 91.00%. Finally, the CapsNet model reaches the accuracy of 96.00%. In this comparative analysis, the CapsNet model reached better accuracy and other performance than the other two classifier models. Table 4 and Figure 6 present multi-class classification.

Table 4 represents the comparative analysis of multi-class classification of different models such as RNN, recurrent neural network, and the pretrained model of CNN (CapsNet). The classifier model of the recursive neural network reached a sensitivity of 93.00% and an accuracy of 89.00%. The recurrent neural network model reaches an accuracy of 84.00%. Finally, the CapsNet model reaches an accuracy of 93.00%. In this comparative analysis, the CapsNet model reached better accuracy and other performance than the other two classifier models.

Table 5 and Figure 7 present the comparative analysis of various pretrained models of CNN in terms of accuracy for binary classification, and the proposed model shows better accuracy in binary classification than multi-class classification, which is shown in Figure 8.

Table 5 represents the comparative analysis of accuracy evaluation for binary classification using different classifier models such as UNet, ResNet, VGG-16, EfficientNet, CapsNet, UNet-DHO, ResNet-DHO, VGG-16-DHO, EfficientNet-DHO, and CapsNet-DHO models. Initially, the UNet model reached an accuracy of 89.00%. The ResNet model reached an accuracy of 87.00%. The CapsNet model reached an accuracy of 90.00%. But, when pretrained models are implemented with DHO, it starts to show better accuracy. VGG-16-DHO model reached an accuracy of 95.50%.

Finally, the CapsNet-DHO model reaches the accuracy of 96.00%. By this comparative analysis, the CapsNet-DHO model reached a better accuracy than other classifier models.

5. Conclusion

This research study successfully developed and analyzed the MRI data using a deep learning framework for combined automatic hippocampus segmentation and AD categorization. Multi-task deep learning (MTDL) is used to learn hippocampus segmentation simultaneously. The hyperparameter optimization of the CNN model (capsule network) for illness classification is then carried out using the deer hunting optimization (DHO) technique. ADNI-standardized MRI datasets have been used to test the suggested method, and it is accurate. Suggested MTDL achieved 97.1% accuracy and 93.5% of Dice coefficient, whereas the proposed MTDL model achieved an accuracy of 96% for binary classification and 93% for multi-class classification. Also, in accuracy evaluation for binary classification, the CapsNet-DHO reached a better accuracy performance than other classifier models. The proposed MTDL reached a better accuracy of 97.1% and achieved better performance than other methods. In the analysis of DSC, FCM achieved 82.1%, FRFCM achieved 91%, ARKFCM achieved 92%, and the proposed model achieved 93.5%. Finally, JSI is high for MTDL (i.e., 87.8%) compared to existing FCM models (69% to 85%). The model considered only one dataset for validation, and as a future work, real-time data will be collected and used for verification process. In addition, the efficiency of the pretrained model of CNN will be validated, where the hybrid DL model will be designed for identification of real-time collected AD images [27].

Data Availability

The data used to support the findings of the study are included within the article and are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.