Abstract

This research aims at establishing a novel hybrid artificial intelligence (AI) approach, named as firefly-tuned least squares support vector regression for time series prediction . The proposed model utilizes the least squares support vector regression (LS-SVR) as a supervised learning technique to generalize the mapping function between input and output of time series data. In order to optimize the LS-SVR’s tuning parameters, the incorporates the firefly algorithm (FA) as the search engine. Consequently, the newly construction model can learn from historical data and carry out prediction autonomously without any prior knowledge in parameter setting. Experimental results and comparison have demonstrated that the has achieved a significant improvement in forecasting accuracy when predicting both artificial and real-world time series data. Hence, the proposed hybrid approach is a promising alternative for assisting decision-makers to better cope with time series prediction.

1. Introduction

Generally, time series forecasting involves the prediction of future values of data based on discovering the pattern in the historical data series and extrapolating that pattern into the future. Time series forecasting is a widely discussed issue and its applications appear in various fields of business and engineering [1]. The reason is that prediction of future events is crucial for many kinds of planning and decision-making processes. Applications regarding time series data can be easily found in the literature, such as wind energy forecasting [2], water resource management [3], traffic accident prediction [4], and cash flow forecasting in construction projects [5]. Hence, it is not surprising that time series analyses and predictions are on the rise among researchers.

Notably, constructing a predictive model for time series forecasting is a challenging task. It is because real-world time series data are often characterized by nonlinearity, being nonstationary, and irregularity [6]. Random noise and effect of unidentified factors are the main causes that degrade the prediction accuracy. Moreover, in most cases, the underlying model that generates the series is unknown and the process of discovering such model is oftentimes hindered by the stochastic nature of the time-dependent data [7]. Particularly, for each time series, determination of a suitable embedding dimension is also of major concern [8, 9]. Therefore, these challenges necessitate the development of advanced approaches.

Over the recent years, there has been increasing efforts dedicated in establishing AI based models to predict real-world time-dependent data. Various AI approaches such as the artificial neural network (ANN), adaptive network based fuzzy inference system (ANFIS), support vector machine (SVM), and least squares support vector machine (SVM) approaches have been applied to cope with time series prediction in various domains [2, 4, 10]. These previous works have illustrated that application of these techniques, as a solution to the challenges of time series problems, is not only feasible but also very effective.

Among the AI methods, the least squares support vector regression (LS-SVR) is an advanced machine learning technique for solving regression analysis [11]. This method has been proved to possess many advanced features [12, 13]. In the LS-SVR’s training process, a least squares cost function is proposed to obtain a linear set of equations in the dual space. Consequently, to derive the solution, it is required to solve a set of linear equations, instead of the quadratic programming as in the standard SVM. Moreover, this linear system can be efficiently solved by iterative methods such as conjugate gradient.

Studies have been carried out to demonstrate the excellent generalization, prediction accuracy, and fast computation of the LS-SVR [1315]. Since time series forecasting can be formulated as a regression analysis problem, it is very potential to apply the LS-SVR to tackle the problem at hand. Nevertheless, the implementation of the LS-SVR requires an appropriate setting of its tuning parameters, namely, the regularization parameter and the kernel function parameter. Improper specification of these tuning parameters can significantly degrade the performance of the machine learning technique.

In the field of AI, the task of parameter setting is well-known as the model selection process [16]. This problem is critical and it has increasingly drawn attentions of many scholars in a variety of disciplines [2, 14, 17]. In practice, identifying the most suitable set of model’s parameters often requires either prior knowledge of the problem domain or tedious trial-and-error processes. To overcome this issue, hybridizing the machine learning techniques with a swarm-based optimization algorithm is a feasible resolution for the problem at hand [16, 18].

Swarm intelligence is a design framework based on social insect behavior [19]. Social insects such as ants, bees, firefly, and wasps are unique in the way these simple individuals cooperate to accomplish complex, difficult tasks. This cooperation is distributed among the entire population, without any centralized control. Each individual simply follows a small set of rules influenced by locally available information. This emergent behavior results in great achievements that no single member could complete by themselves [20]. Firefly algorithm (FA) is one of the recent swarm intelligence methods which was based on the flashing patterns and behavior of tropical fireflies [21]. According to previous works, the algorithm is very efficient and can outperform conventional algorithms in solving many optimization problems [22, 23].

Therefore, the purpose of this study is to fuse the LS-SVR and FA techniques to establish a new hybrid AI model for prediction of time series. Our research goal is to build a model that possesses the capability of delivering accurate as well as operating autonomously without human interference. The second section of this paper reviews the methods needed to accomplish the research objective. In the third section, the framework of the proposed FLSVRTSP is described in detail. The fourth section demonstrates the experimental results. Conclusion on our study is mentioned in the final section.

2. Literature Review

2.1. Time Series Prediction

Time series forecasting is an important subject in which past observations of an interested variable are recorded and analyzed to establish a prediction model [24]. The developed model is built with the expectation that it can describe the underlying relationship between patterns in the past and value of the variable in the future (see Figure 1). At the current time , and given the recorded observations of a time series, the task is to forecast the future value . Herein, represents the embedding dimension; denotes the forecasting horizon. If is one, the problem is known as single-step-ahead forecast. Meanwhile, problem involved greater value of is often referred as multiple-step-ahead forecast [25].

Generally, in time series prediction, the historical time series are transformed into high dimensional space to facilitate the exploration of implicit pattern lying in the series. This process of transformation, widely known as state reconstruction [8, 9], is dependent on the embedding dimension . Equation (1) illustrates the state reconstruction process for one-step-ahead forecasting in which the original time series (with element) is transformed into an input matrix of the size -by- and an output matrix of the size -by-1:

In time series analysis, the parameter is crucial because of its influence on the prediction performance. For each time series data, the embedding dimension can be calculated using the “false nearest neighbor” (FNN) approach established by Kennel et al. [26]. However, from the perspective of machine learning, this parameter can play the role as a tuning parameter in the prediction model and its optimal value can be searched by an optimization technique [27].

2.2. Least Squares Support Vector Regression (LS-SVR)

This section of the paper describes the mathematical formulation of the LS-SVR. Consider the following model of interest, which infers the mapping between a response variable and one or more independent variables [11, 13, 28]: where , , and is the mapping to the high dimensional feature space.

In LS-SVR for regression analysis, given a training dataset , the optimization problem is stated as follows: where are error variables; denotes a regularization constant.

In (3), it is noticed that the objective function is composed of a sum of squared fitting error and a regularization term. This cost function is similar to standard procedure in training feedforward neural networks and is related to a ridge regression. However, when becomes infinite, one cannot solve this primal problem. Hence, it is necessary to establish the Lagrangian and derive the corresponding dual problem.

The Lagrangian is given as follows: where are Lagrange multipliers. The conditions for optimality are given by:

After elimination of and , (5) can be represented as the following linear system: where , , and .

And the kernel function is applied as follows:

The resulting LS-SVR model for function estimation is expressed as: where and are the solution to the linear system (6). The kernel function that is often utilized is radial basis function (RBF) kernel. Description of RBF kernel is given as follows: where is the kernel function parameter.

When the RBF kernel is used, there are two tuning parameters that are needed to be determined in LS-SVR. The regularization parameter controls the penalty imposed to data points that deviate from the regression function. Meanwhile, the kernel parameter influences the smoothness of the regression function. It is worth noticing that proper setting of these tuning parameters is required to achieve desirable performance of the prediction model.

2.3. Firefly Algorithm (FA)

The FA is a stochastic, nature-inspired, and metaheuristic algorithm that can find both the global optima and the local optima simultaneously and effectively [21]. The flashing lights of fireflies are an amazing sight in the summer sky in tropical and temperate regions. The pattern of flashes is often unique for a particular species. In essence, each firefly is attracted to brighter ones as it randomly explores while searching for prey.

The FA algorithm uses the following three idealized rules: (1) all fireflies are unisex, so each firefly is attracted to other fireflies regardless of their sex, (2) the attractiveness of a firefly is proportional to its brightness and decreases as the distance increases. A firefly moves randomly if no other firefly is brighter, and (3) the brightness of a firefly is affected or determined by the landscape of the objective function [22, 29]. The FA algorithm can be illustrated in Pseudocode 1.

Begin FA
Define objective function , where
Generate an initial population of fireflies
Formulate the light intensity
Define the absorption coefficient
While ( < Max_Generation)
For = to (all fireflies)
  For = 1 to (all fireflies)
   If (), move firefly towards firefly
   End if
   Evaluate new solutions and update light intensity;
  End for
End for
Rank the fireflies and find the current best
End while;
End FA

The brightness of an individual firefly can be defined similarly to the fitness value in the genetic algorithm [30]. The light intensity varies according to the inverse square law as follows: where = the light intensity at the source. For a given medium with a fixed light absorption coefficient , the light intensity varies with the distance . Thus, the light intensity can be computed in the following way:

The combined effect of both the inverse square law and absorption can be approximated as the following Gaussian form as follows:

As the attractiveness of a firefly is proportional to the light intensity seen by adjacent fireflies, the attractiveness of a firefly is defined as:

The distance between any two fireflies and at and , respectively, is the Cartesian distance as follows:

The movement of the th firefly when attracted to another more attractive (brighter) th firefly is as follows: where and represent the position of the flies at and generations. denotes the position of the flies at and generations is absorption coefficient and typically varies from 0.1 to 10 in most application; = the attractiveness at ; trade-off constant to determine the random behavior of movement; rand represents a random number drawn from Gaussian distribution. In essence, (15) describes the mechanism for updating a firefly in the current population. The movement of a firefly towards another firefly is dependent upon to the attractiveness and a quantity that reflect the randomness in animal behavior.

3. The Proposed Firefly-Tuned Least Squares Support Vector Regression for Time Series Prediction (FLSVRTSP)

This section dedicates in describing the proposed prediction model, named as FLSVRTSP, in detail. The establishment of the model (see Figure 2) is accomplished by a fusion of the LS-SVR and FA algorithms. The FLSVRTSP employs the LS-SVR as the supervised learning algorithm for mining the implicit patterns in the series. Furthermore, the FA, an evolutionary optimization algorithm, is utilized to automatically identify the optimal values of tuning parameters. The construction of the prediction model is dependent on a set of tuning parameters. The embedding dimension is needed in the state reconstruction process. The regularization parameter and the kernel function parameter are required for the FLSVRTSP.

(1) Input Data. The FLSVRTSP takes a univariate time series as its input. The data can be recorded at regular time interval, for example, daily, monthly, and quarterly, and so forth. The whole data set is divided into training set, validating set and testing set. In our study, the ratio of the validating set to the training set is 1/5.

(2) Tuning Parameter Initialization. The aforementioned tuning parameters of the model are randomly generated within the range of lower and upper boundaries (see Table 1) in the following manner: where is the tuning parameter at the first generation. denotes a uniformly distributed random number between 0 and 1. and are two vectors of lower bound and upper bound for any parameter.

(3) State Reconstruction. With the embedding dimension being specified, the time series is transformed to the input matrix and the desired output vector (see (1)). After being transformed, the data is used for the LS-SVR’s training process.

(4) LS-SVR Training. In this step, the LS-SVR is deployed to learn the mapping function between the input () and the output () derived at the previous step. The training process requires the two parameters and that are acquired from the FA searching. The regularization parameter () controls the penalty imposed to data points that deviate from the regression function. Meanwhile, the kernel parameter () affects the smoothness of the regression function. It is worth noticing that proper setting of these tuning parameters is required to ensure desirable performance of the prediction model.

(5) FA Searching. The FA automatically explores the various combinations of the tuning parameters ( and ). At each generation, the optimizer carries out the mutation, crossover, and selection processes to guide the population to the optimal solution. By evaluating the fitness of each individual, the algorithm discards inferior combinations of and , and allows robust combinations of these parameters to be passed on the next generations.

(6) Fitness Evaluation. In the FLSVRTSP, in order to determine the optimal set of tuning parameters, the following objective function is used in the step of fitness function evaluation:

In (17), and denotes the training error and validating error, respectively. The training and validating errors herein are root mean squared error (RMSE) calculated as follows: where and denote predicted and actual value for output th. In addition, is the number of data points.

The fitness function, in essence, represents the trade-off between model generalization and model complexity. It is worth noticing that well-fitting of the training set may reflect the model complexity. However, complex model tends to suffer from over-fitting [31]. Thus, incorporating the error of the validating data can help identify the model that features the balance of minimizing training error and generalization property.

(7) Stopping Condition. The FA’s optimization process terminates when the maximum number of generation is achieved. If the stopping condition has not met, the FA continues it searching progress.

(8) Optimal Prediction Model. When the program terminates, the optimal set of tuning parameters has been successfully identified. The FLSVRTSP is ready to carry out forecasting tasks.

4. Experimental Results

In this section, the newly developed FLSVRTSP is applied to forecast three time series: the Mackey-Glass series, the daily water discharge at Palo Verde drain (http://waterdata.usgs.gov/), and the monthly USD/TWD exchange rate (http://fx.sauder.ubc.ca/data.html). The Mackey-Glass chaotic time series is defined by (19) [10]. Herein, the parameter is set to be 17. In our study, 500 data cases are generated in which 400 cases are used for training and validating process. The rest of the data is used for testing the model as follows:

The daily water flow data set consists of 273 data cases of daily water discharge (cubic feet per second) at Palo Verde outfall drain, from 1/1/2011 to 9/30/2011 (see Figure 3). The number of data cases used for testing is 30. The monthly USD/TWD exchange rate includes 260 records from 1/1990 to 8/2011 (see Figure 4). In the experiment, 36 data cases are utilized for testing process. For these two time series, one-step-ahead prediction is carried out.

Moreover, the back propagation neural network (BPNN), the adaptive network based fuzzy inference aystem (ANFIS) [32], and the evolutionary support vector machine inference model (ESIM) [33] are used for result comparison. For the BPNN, it is needed to specify the number of hidden layers and hidden neurons. For the ANFIS, the type of membership function and the number of membership functions for each input are required for constructing the prediction model. The determination of these parameters is often carried out by repetitive trial-and-error tuning processes. In this study, for each time series, we select the model configuration that yields the smallest prediction error of validating data.

It is noticed that the embedding dimensions () for the BPNN, ANFIS, and ESIM models are calculated by the FNN approach [26]. Using this approach, the embedding dimensions for the Mackey-Glass series, the water flow series, and the USD/TWD exchange rate series are 3, 3, and 4, respectively. Meanwhile, in the FLSVRTSP, the FA automatically identifies the optimal embedding dimensions. The optimal tuning parameters of the FLSVRTSP for these three time series are shown in the Table 2.

For performance comparison, root mean square error (RMSE) and mean absolute error (MAE) for training and testing data sets are calculated. The forecasting results obtained from the BPNN, ANFIS, ESIM approaches, and the proposed FLSVRTSP are provided in Table 3. It is observable that the FLSVRTSP has achieved a significant improvement in terms of prediction accuracy. The prediction errors of testing data sets yielded by the newly developed model are smaller than that obtained by other AI approaches. This means that FLSVRTSP has better generalization property and it has successfully diminished the problem of over-fitting.

In prediction of Mackey Glass series, RMSE, and MAE of the FLSVRTSP for testing data are 0.005 and 0.004, respectively. The ANFIS shows relatively good forecasting result while performance of the ESIM is poor. In the case of the water flow series, the FLSVRTSP and the ESIM outperform the BPNN and the ANFIS. However, the FLSVRTSP prediction is slightly better than that of the ESIM. Herein, RMSE and MAE of the FLSVRTSP for testing data of the water flow series are 10.33 and 8.35.

In the task of forecasting USD/TWD exchange rate series, although the ANFIS model delivers the smallest error in the training data set, its performance on the testing data set is undesirable. Moreover, the FLSVRTSP yields the best outcome since its RMSE and MAE for the testing data are 0.36 and 0.29. The experimental results have shown that a hybridization of the LS-SVR and FA algorithms can deliver more superior performance compared with other benchmark approach. The FA algorithm has autonomously identified the most appropriate values of the LS-SVR’s tuning parameters as well as the embedding dimension. This eliminates the tedious effort for setting the model parameters and also enhances the model prediction performance.

5. Conclusion

This paper has presented a novel hybrid AI model, named as the FLSVRTSP, to assist decision-makers in dealing with time series forecasting. The FLSVRTSP was developed by a fusion of the LS-SVR and FA techniques. The LS-SVR is employed to infer the input/output mapping function of time series data. Meanwhile, the FA searching algorithm is utilized to identify the most appropriate set of tuning parameters. This mechanism eliminates the need of expertise or trial-and-error process in parameter setting. Moreover, simulation and performance comparison, for simulated and real-world time series data, have proven the aptitude of the FLSVRTSP. These facts demonstrate the strong potential of the proposed model as an alternative for time series forecasting. The future direction of the current work may include improving the current model for solving multistep-ahead time series prediction and applying the hybrid intelligent model to forecast other real-world time series.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.