Abstract

Adversarial examples have the property of transferring across models, which has created a great threat for deep learning models. To reveal the shortcomings in the existing deep learning models, the method of the ensemble has been introduced to the generating of transferable adversarial examples. However, most of the model ensemble attacks directly combine the different models’ output but ignore the large differences in optimization direction of them, which severely limits the transfer attack ability. In this work, we propose a new kind of ensemble attack method called stochastic average ensemble attack. Unlike the existing approach of averaging the outputs of each model as an integrated output, we continuously optimize the ensemble gradient in an internal loop using the model history gradient and the average gradient of different models. In this way, the adversarial examples can be updated in a more appropriate direction and make the crafted adversarial examples more transferable. Experimental results on ImageNet show that our method generates highly transferable adversarial examples and outperforms existing methods.

1. Introduction

Deep neural networks (DNNs) have made promising breakthroughs in the field of computer vision (CV), such as automatic driving [1], face recognition [2], image classification [3], and many others [4, 5]. However, DNNs are proven to be vulnerable to adversarial attacks; the well-designed perturbed examples (adversarial examples) can often mislead CV models while keeping the imperception at the same time [6, 7]. This has drawn much attention to the research on adversarial attacks, because it can help us to identify model flaws [8] and improve the robustness of them [7]. In practice, it is hard to get specific information about the victim’s model. Therefore, more practical black-box attacks are beginning to be studied extensively.

In general, there are two paradigms for Black-box attacks: query-based and transfer-based. Among them, query-based attacks have poor practical usability, because they typically require a lot of querying of the victim’s model outputs, which tends to attract the victim’s attention. Therefore, we concentrate on the transfer-based attacks. The principle behind transfer attacks is that adversarial examples have cross-model transferability [9]; the adversarial examples crafted on surrogate models often can mislead other models, even these models use different structures [9]. This kind of transferability provides feasibility for strictly black box attacks. In recent years, the idea of an ensemble started to be used to improve the transferability of the adversarial examples [10, 11]. The model ensemble uses the outputs of multiple models instead of a single model to minimize the bias of a single model. When combined with adversarial attacks, it can help adversarial examples find an ensemble optimization direction, reducing overfitting to individual models and enhancing their transferability [10].

However, the optimization directions among different models may have significant variance due to the different model architectures [12]. Existing approaches ignore this kind of variance. They simply combine the outputs of different models and use it as the ensemble optimization direction, which leads to a variance between the average ensemble output and the individual model outputs and limits the transferability of the adversarial examples. In this work, we notice the optimization algorithm stochastic average gradient (SAG) [13] for stochastic gradient descent (SGD), which reduces the stochastic gradient variance caused by randomly selecting samples in SGD. In the SGD algorithm, the variance between the gradient of randomly selected samples and the average gradient of all samples is similar with the variance encountered in model ensemble attacks. Both of these variations can be categorized as variances between the mean value and the individual values. Therefore, we plan to address it using the principles of SAG.

Based on the analyses above, we propose a novel method called stochastic average ensemble attack (SAEA). In our method, we use an internal loop to reduce the variance between the ensemble output and the multiple models’ output. Then, we use this ensemble output in the external loop to generate adversarial examples. Specifically, at the external loop, we compute the outputs of the input image across multiple models and maintain them in memory. In the internal loop, similar with SAG, we randomly select one model and update the corresponding model output through the input images. Then, we fuze this updated output with the maintained output to obtain an ensemble output. Alternatively, we will update the internal loop image at the end of the internal loop using the ensemble output and feed it into the next internal loop. After multiple rounds of internal loop updates, the differences between the ensemble output and the multiple model outputs will be reduced. Finally, we use this ensemble output to perform iterative model ensemble attacks. In this way, SAEA can yield a more accurate update direction of the adversarial examples across multiple models to generate adversarial with higher transferability.

We have conducted extensive experiments to evaluate our method on the ImageNet dataset [14], and the results have indicated our SAEA can achieve better results in transfer attack scenarios compared to existing model ensemble attack methods.

The remainder of this paper is structured as follows: in Section 2, we summarize the related work on adversarial attacks and defenses. In Section 3, we first introduce our motivation and then introduce our attack algorithm. In Section 4, we demonstrate the effectiveness of our attack through extensive experiments and highlight its superiority over two other model ensemble attacks. Finally, Section 5 concludes this work.

Since the concept of adversarial examples was proposed, a lot of attack algorithms have been subsequently designed, such as gradient-based attacks, input transformation attacks, and model ensemble attacks.

2.1. Gradient-Based Attacks

The fast gradient sign method (FGSM) [7] is the most representative attack method, which adding perturbations in the direction of the gradient to benign samples as follows:where denote the magnitude of adversarial perturbations, and denote the loss function.

The iterative fast gradient sign method (I-FGSM) [15] proposed an iterative version of FGSM. It increases the transferability of adversarial samples by repeatedly adding small perturbations to the images as follows:where limits the perturbation within the -ball of the benign input denote the iteration number and denote the step size.

The momentum iterative method (MIM) [11] introduced momentum into the I-FGSM to make the updating direction of the adversarial examples more stable. Nesterov iterative method [16] accelerated the craft speed of adversarial examples by applying accelerated gradients into the attack algorithm. The lookahead iterative method [17] tuned the update direction by recording the gradient in multiple previous steps to get rid of suboptimal regions during the update of the adversarial examples. These gradient-based attacks make the generated adversarial perturbations more accurate by optimizing the gradient, which is highly effective in both white-box and black-box attacks.

2.2. Input Transformation Attacks

Iterative gradient-based attacks require to update the adversarial examples multiple times on the local surrogate model, which can lead crafted examples overfit to the surrogate model and affect the ability to transfer to other models. To address this issue, input transformation attacks use the idea of data augmentation to reduce the risk of overfitting. Diverse input method (DIM) [18] proposed using the ideas of data augmentation, which adding random transformations to benign inputs to reduce the effect of overfitting. The translation invariant method (TIM) [19] proposed using translation invariant to generate a series of transformed copies to make the perturbations more accurate and used an optimized method to reduce computation. The scale invariant method (SIM) [16] proposed a method that performs a gradient attack by scaling the input image and averaging the gradients computed on the resulting copies. Admix [20] implements data augmentation by combining images from different categories in a small ratio. SSA [21] transforms the input samples in the spatial frequency domain for input augmentation to reduce the overfitting of the adversarial examples. PAM [22] introduces a semantic discriminator to prevent the difference between the semantics of the augmented samples and the original samples from being too large and generating inaccurate adversarial perturbations.

2.3. Model Ensemble Attacks

To further improve the robustness and eliminating bias in a single model, model ensemble methods have been widely studied and applied in the model training process. Such methods of improving the accuracy of model outputs can also be used to adversarial attacks. Liu et al. [10] first proposed the model ensemble attack, which increases the transferability by combing the prediction of different models. Dong et al. [11] proposed different ensemble methods to implement ensemble attacks by combing logits and losses of different models. To further improve the transferability of the adversarial examples, Xiong et al. [12] proposed a stochastic variance-reduced method to improve the accuracy of ensemble output.

2.4. Adversarial Defenses

If it is possible to successfully attack a model with added defense mechanisms, it can significantly demonstrate the effectiveness of the attack method. In recent years, many methods have been proposed to improve the robustness of models. In general, there are three kinds of defense methods, including adversarial training [15, 2325], adversarial detection [2628], and input transformation defenses [2934].

Adversarial training improving robustness by retraining on adversarial examples. Adversarial detection and input transformation defense detects and cleans samples before they are input into the model to reduce the threat of potential adversarial examples. In this paper, we will verify our method using some advanced defense models, including reducing the resolution of the adversarial examples (JPEG) [29], randomly resizing and padding the images (randomly resize and pad (R&P)) [30], using denoising to eliminate the perturbation (high-level representation-guided denoiser (HGD)) [31], using feature distillation (FD) [32] to purify the perturbation by redesigning the image compression framework JPEG, end-to-end image compression model to defend against adversarial examples (ComDefend) [33], randomized smoothing (RS) [34] technique to make the target model more robust.

3. Materials and Methods

3.1. Preliminary

Let denote the benign input and denote the corresponding ground-truth label of . Given a classifier with parameters that outputs a label as the prediction of the input image. The task of the adversarial attack is to craft an example , which is indistinguishable to human eyes with benign input but can mislead the classifier . Formally, the optimization function of this task can be defined as follows:where represents the discrepancy between the perturbed images and the benign images, and we consider it as the constraint in this paper.

The idea of ensembling has been widely used to improve model robustness [25]. It can also be applied to adversarial attacks because ensemble methods can diminish biases of individual models; they can yield an adversarial example update direction that is suitable for the majority of models. The existing model ensemble methods can be classified into three categories: (1) ensemble on predictions, (2) ensemble on logits, and (3) ensemble on losses.

Liu et al. [10] proposed a model ensemble attack through combining predictions of different models. The loss function for models can be ensembled as follows:where denote prediction output, denote the combine weight with and . Dong et al. [11] proposed using logits and losses to implement model ensemble attacks. The logits of models can be ensembled as follows:where denote the logits output of the th model. The losses of models can be ensembled as follows:where denote the loss of the th model.

In this work, we select the I-FGSM to craft adversarial examples, which craft adversarial examples by adding small perturbations multiple times. Adding perturbations multiple times can help prevent adversarial examples from getting stuck in the local optimum, thus increasing the transferability. In addition, we use the ensemble gradient to implement the model ensemble attack as follows:where denote the ensemble gradient.

3.2. Motivation

Lin et al. [16] link the training of the models to the generation of adversarial examples. When generating adversarial examples, the parameters of the model are fixed, and the adversarial examples are updated by adjusting the added perturbations. This seems similar with the training process of the model, which fixes the training samples and seeks appropriate parameters to improve model performance. As a result, many methods for optimizing model training are beginning to be used to optimize adversarial examples [16].

To further improve the quality of the adversarial example, the idea of model ensemble optimization algorithms has been widely applied in the generation of adversarial examples [10, 11]. The principle of model ensemble attacks is that if an adversarial example can mislead multiple models simultaneously, it may have the ability to mislead more black-box models [11]. Additionally, by combining the outputs of multiple models, it is possible to reduce individual model biases and improve the accuracy of attacks. However, most ensemble attack methods directly add the outputs (predictions, logits, and losses) of different models with equal weights, and there is no additional processing applied to the obtained average result. Since adversarial examples have different optimization directions on different models, directly averaging the outputs of multiple models may limit the quality of adversarial examples. This is similar to the issue of the large variance between the stochastic gradient and the ground-truth gradient in SGD, which can lead to getting trapped in local optima [35]. To solve the problem of gradient variance in SGD, algorithms such as SAG [13] and stochastic variance-reduced gradient [36] have been proposed. These kinds of algorithms aim to prevent getting stuck in local optima by reducing the stochastic variance introduced by randomly selecting samples. Based on the analysis above, we propose to combine model ensemble attacks with the optimization algorithms of SGD. This combination aims to reduce the stochastic variance in model ensemble attacks to improve the transferability of crafted adversarial examples; the overall structure of the SAEA can be seen in Figure 1.

3.3. SAEA Method

In this work, we view the process of model ensemble attack as the model training process, like Liu et al. [10]. We attempt to optimize the update direction of adversarial examples on ensembled models by reducing the difference between the ensemble gradient and individual gradients in the model ensemble attacks. This issue is similar to reducing the variance between the randomly selected sample gradients and actual gradients in the SGD algorithm. Therefore, we notice the SAG [13]. SAG retains the average of historical gradients and uses these averages to estimate the variance of the gradient, which can reduce the stochastic variance in the SGD algorithm. Additionally, by using the average gradient with lower random variance, SAG exhibits much faster convergence compared to the traditional SGD. If combined with adversarial attacks, it can also enhance the speed and quality of generating adversarial examples.

Input: A clean image and its ground-truth label surrogate models and their gradients ,
Input: The size of perturbation , iterations , number of internal loops , external loop step size , internal loop step size .
Output: Adversarial example .
  1:
  2: Initialize ;
  3: for to do
  4:  Input and output for
  5:  Initialize ;
  6:  for to do:
  7:   Random choose a model from ;
  8:   Get the gradient of the chosen model ;
  9:   Update by Equation (7);
  10:   Update ;
  11:  end for
  12:  Update ;
  13: end for
  14: return.

Inspired by SAG, we propose SAEA. Specifically, we used two loops, internal and external, to implement the attack algorithm. The internal loop obtains an ensemble gradient with a smaller variance, and the external loop is used to combine with the iterative gradient attack algorithm to craft higher transferability adversarial examples.

The integration of SAEA with I-FGSM [15] can be summarized as Algorithm 1. At the beginning of the algorithm, we treat clean image as the initial adversarial examples . In each external loop, we input the adversarial examples crafted in the previous loop into models to obtain their respective gradients and maintain them in memory, denoted as . After that, we utilize the idea of SAG in the internal loop to obtain an ensemble gradient with the minimum variance among these gradients. The internal loop that obtains the ensemble gradient is the crucial part of our algorithm. Specifically, at the beginning of each internal loop, we treat the image input from the external loop as the initial internal adversarial examples . We randomly select one model from models and input the internal adversarial examples into this selected model to update . After that, we use this updated and others model gradient to obtain the new by Equation (7). Then, we use the ensemble gradient from each internal loop to update the internal adversarial examples. In order to maintain the semantic information of the internal adversarial examples, we apply a Clip function to them to ensure the accuracy of the ensemble gradient obtained in the internal loop. The crafted internal adversarial examples will be used in the next internal loop to continue update the and by Equation (7). After rounds of internal loop updates, will have a smaller random variance compared to the ground-truth gradients of each model, and we treat it as the ensemble gradient. This ensemble gradient can better represent the direction of example updates on the model set and craft adversarial examples with higher transferability.

In short, SAEA uses an ensemble gradient with smaller gradient differences among models to craft adversarial examples. This enables adversarial examples to achieve better results on multiple surrogate models, and the capability to transfer to other models is also improved. Additionally, SAEA can combine with other advanced input transformation methods (e.g., SI [16], DI [18], TI [19], and Admix [20]) to show better results.

3.4. Relationships between Different Attacks

SAEA, SVRE, and ENS are all model ensemble attacks and belong to Iteration FGSM; their relationships are shown in Figure 2, where is the probability of the random transformation, is the size representing the Gaussian kernel, and is the scale copy numbers. The differences and transformations between them can be summarized as follows:(i)If the internal iteration is set to 0, the SAEA and SVRE will become Ens.(ii)SVRE and Ens use different methods to obtain ensemble outputs with SAEA.(iii)If the internal iteration , the external iteration and the surrogate model numbers is set to 1, ENS, SVRE, and SAEA will degrade to the FGSM.

4. Experiments

In this section, we will demonstrate our method with extensive experiments. First, we introduce the parameter settings about our experiment. Then, we compare the attack success rate (ASR) of our method on single-representative models and ensemble-trained models with other SOTA ensemble methods. In addition, we also validate our methodology on some advanced defense methods. Finally, we will conduct ablation experiments to explain the parameters that affect the experimental results.

4.1. Experiments Setup
4.1.1. Models

We choose four representative models, including Inception-v3 (Inc-v3) [36], Inception-v4 (Inc-v4) [37], Inception-Resnet-v2 (IncRes-v2) [37], Resnet-v2−101 (Res-v2) [38] to craft adversarial examples. To better evaluate the transferability of the crafted adversarial examples, we add three adversarially trained models, including Inc-v3ens3, Inc-v3ens4, and IncRes-v2ens [25].

In addition, we also choose seven input transformation defense methods to validate the adversarial examples, which are: JPEG compression [29], R&P [30], NIPS-r3, HGD [31], FD [32], ComDefend [33], and RS [34].

4.1.2. Dataset

The dataset we used is the ImageNet-compatible dataset [14] that is commonly used in adversarial attack algorithms [12, 19], which contains 100 images selected from the ImageNet dataset.

4.1.3. Baseline

In the experiment, we compare our method with SVRE and ENS algorithms on multiple baseline attack methods, which are I-FGSM [15], MIM [11], SI [16], DI [18], and TI [19]. The attack type is a nontargeted attack. For all ensemble attack methods, we set the weight of multiple outputs of the ensemble model to , where denotes the total number of ensembled models.

For better comparison, the hyperparameters in the experiment remain the same as in previous works [11, 12], with a maximum perturbation value of 16 and pixel values in the range of 0–255. The attack iterations is set to 10, the decay factor for all baselines to craft adversarial example are set to 1. For MIM, the step size is set to 1.6. For TIM, we set the kernel size to 7. For DIM, the probability of random transformation is set to 0.5. For SIM, we set the scale copy numbers to 5. The parameters in the SAEA are the same as in SVRE, where the internal step size are set to 1.6, and the number of internal iterations is set to 16.

4.2. Result on Single Representative Model

We first compare SAEA with Ens and SVRE on four representative models, including Inc-v3, Inc-v4, IncRes-v2, and Res-101. We test the effectiveness of the attack with four models ensembled and with one model excluded, respectively.

Table 1 shows the ASR of the three methods on the ensemble four models and hold-out corresponding models (the best results are marked in bold). While our method is slightly inferior to Ens and SVRE in the white box environment, it outperforms both in the black box environment, indicating higher transferability. Moreover, transferability will be further improved when combined with other methods.

4.3. Result on Ensemble-Trained Models

We then compare the ASR of adversarial examples on three models that have undergone ensemble training, including Inc-v3ens3, Inc-v3ens4, and IncRes-v2ens. The adversarial examples are crafted on the representative models, which are Inc-v3, Inc-v4, Res-101, and IncRes-v2. By combining Ens, SVRE, and SAEA with multiple attack baseline methods, we test the transferability of the crafted adversarial examples on these defense models.

Table 2 shows the ASR of adversarial examples on the ensemble-trained model when hold out the corresponding model, Table 3 shows the ASR of adversarial examples against ensemble-trained models on the four representative models (the best results are marked in bold). We can see that our method outperforms Ens and is better than SVRE in most cases on various baselines. The greatest improvement is observed when combined with the TIM baseline, where our method SAEA-TIM achieves a success rate improvement of 17.6% over Ens-TIM and 3.4% over SVRE-TIM, demonstrating stronger transferability.

4.4. Result on Advanced Defense Models

In this section, we select seven advanced input transformation defense methods, including JPEG [29], RP [30], HGD [31], FD [32], ComDefend [33], RS [34], and NIPS-r3 to evaluate our method. The adversarial examples are crafted on the representative models, including Inc-v3, Inc-v4, Res-101, and IncRes-v2. For R&P, ComDefend, and JPEG, we use Inc-v3ens3 as the verification model. For RS, we set the failure probability to 0.001 and the hyperparameter to 0.5. For other methods, we use the official models proposed in their respective papers. Table 4 shows the results (the best results are marked in bold). We can see that, under each combination with the baseline method, our proposed method achieves the best performance on most defense methods, especially combined with the TI-DIM method. For the average success rate on these seven models, our method is 10.5% higher than Ens and 3.2% higher than SVRE, demonstrating strong transferability and the ability to deal with various defense mechanisms.

4.5. Results on Baseline Method

Finally, we validate our approach on separate baseline methods, including DIM, TIM, SIM, and Admix. We use four representative models as surrogate models and both the representative models and the ensemble-trained models as target models. The results of the experiments are shown in Table 5 (the best results are marked in bold). We can find that there is an advantage of our method over a variety of baseline methods, representing the effectiveness of our method.

4.6. Ablation Study
4.6.1. Internal Iteration

The number of internal iterations determines the variance between the ensemble gradient and the individual model gradients. We compare SAEA combined with five baseline methods with internal iteration rounds that are multiples of 4; when the internal iteration round is 0, the SAEA becomes the Ens. The ASR on ensemble-trained models is shown in Figure 3. As internal iteration increases, the ASR of the five methods gradually increases. Each method achieves the best performance at different internal iterations. We can also see that when the internal iterations exceed a certain value, the adversarial examples will cause overfitting on the four surrogate models.

4.6.2. Internal Step Size

The internal step size determines the degree of update for each internal image. We integrate SAG with three different baselines, fixing external step size at 1.6, and varied between 0.1 and 25.6, as shown in Figure 4, the ASR (%) of various methods reaches the maximum at different step sizes. To make better comparisons with other methods, we also set to 1.6.

4.7. The Calculate Times

As SAEA and SVRE both have an internal loop, they require more computational resources than ENS when compared at the same number of iterations. To better assess the effect of the gradient computation times on the experimental results, we compare SAEA with SVRE under the same number of computations. In the case of the MI-FGSM baseline method with 16 internal loops, SVRE needs to compute the gradient on the ensemble model once in each external loop, which can be viewed as computing the gradient of each single model four times. In addition, 32 gradient computations are required in 16 internal loops, making a total of 36 computations for SVRE. In contrast, SAEA directly aggregates the gradients without the need for additional gradient computations and only requires one-time ensemble gradient computation in the external loop and one gradient computation per internal loop for a total of 20 computations. Therefore, the computational cost of SVRE is 1.8 times that of SAEA. Figure 5 shows the experimental results of the three methods under different computational expenses; we can see that ENS is able to generate adversarial examples faster, but it can only exhibit a limited success rate of transfer attacks. In addition, SAEA requires less computational resources compared to SVRE, and the final generated adversarial examples have the highest transferability among the three methods, which shows the effectiveness of our method.

5. Conclusion

In this work, we propose a new attack method called SAEA, aiming to craft more transferable adversarial examples. SAEA generates adversarial examples by reducing the difference between the average gradient and the gradients of individual models. In addition, SAEA can be combined with other methods to further improve attack capability, and extensive experiments show that our method crafts more transferable adversarial examples than existing methods. However, there are still some limitations in our approach; for example, the inner loops increase the computational cost and the time to generate adversarial examples, which performs poorly in some attack scenarios with strict time requirements. In future work, we will try to optimize this problem, consider introducing our approach to multimodal attacks, and try to experiment on more datasets.

Data Availability

The data and code can be found at https://github.com/LeiZhaoYNU/SAEA.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is supported by the Youth Project for Basic Research of Yunnan Province Science and Technology Department (no. 202301AU070194), the Fundamental Research Funds for the Central Universities (no. 2042022kf0021), and the Science and Technology Plan in Key Fields of Yunnan Province (no. 202202AD080002).