Abstract

This paper deals with modeling hydrogen contents of bio-oil (H-BO) as a function of pyrolysis conditions and biomass compositions of feedstock. The support vector machine algorithm optimized by the grey wolf optimization method has been used in modeling this end. Comprehensive data for this purpose were aggregated from previous sources and reports. The results of various analyses showed that this algorithm has a high ability to predict actual results. The calculated values of R2, MRE (%), MSE, and RMSE were obtained as 0.973, 1.98, 0.0568, and 0.241, respectively. According to the results of various analyses, the high performance of this model in predicting the output values was proved. Also, by comparing this model with the previously proposed models in terms of accuracy, it was observed that this model had a better performance. This algorithm can be a good alternative to costly and time-consuming laboratory data.

1. Introduction

Consumption of fossil fuel-based energy is increasing because of several developing economies and a rise in the population. This causes a rise in emissions of greenhouse gas, a reduction in the amount of fossil fuel in several countries, and an increase in the fuel price in the market [1, 2]. Renewable energy resources can be substituted with fossil fuel-based energy to manage the aforementioned issues and decrease fossil fuel geographical reliance [3]. Different sustainable energy resources such as the energy of wind, solar, geothermal, hydro, and biomass are possible alternatives [4, 5]. Among these renewable energy resources, bioenergy (biomass energy) is the most sustainable and promising one, which could be substituted with old fuels for chemical and energy applications. Biomass is mostly produced from plants, involving municipal solid wastes, forestry and agriculture remain, sewage sludge, and food waste [6, 7]. It could be transformed into liquid, gaseous, and solid products through thermochemical and biochemical conversion procedures. Due to the substantial progress in the past years, researchers can devise thermochemical procedures and propose comparatively great conversion performance with easy pretreatment and low cost [8, 9].

One of the procedures in which thermal decomposition of materials has taken place to create noncondensable gas, biochar, and BO in the absence of oxygen is pyrolysis. The BO, liquid product or pyrolysis oil, is a viscous dark brownish fluid commonly comprised of 350 greatly oxygenated composites [10, 11]. Mostly, the yield or quantity of BO relies on the conditions of pyrolysis and the composition of biomass feedstock [12]. The composition of biomass feedstock is generally described by ultimate and proximate analysis.

To obtain the biomass elemental composition comprising H, O, C, and N contents, the ultimate analysis is employed, while the quantitative analysis is implemented to obtain ash, volatile, the fixed carbon, and moisture substances of organic matter. Different factors like heating rate, pyrolysis temperature, residence time, and biomass particle size can affect the pyrolysis procedure [13, 14]. Several investigations have been performed about the effect of the composition of biomass’s raw material and state of pyrolysis on the BO generation. For example, Gholizadeh et al. conducted a study about the production of BO from twenty various biomass feedstocks and found that the mean produced BO was greater from the woody biomass (52 percent wet wt.) in comparison with the herbaceous biomass (38 percent wet wt.) [15]. Also, Sarkar and Wang used slow pyrolysis of waste coconut shells to investigate the effect of temperature on the yield of BO and found that the highest BO production (48.7% wt.%) was obtained at 600 degrees Celsius [16]. Hao et al. discovered that, at 500°C, the BO produced from the UPM (Ulva prolifera macroalgae) and RS combination (rice straw) generated the maximum amount of BO (46.68 wt.%) [17]. Hanif et al. also investigated the effect of reaction temperature on BO output and discovered that a 300–350°C average temperature resulted in the highest BO output from algal biomass (48 wt.%) [18]. Traditional methods for determining the yield of BO and its relationship to influential parameters such as conditions of pyrolysis and composition of biomass need extensive testing, which is labor-intensive, costly, and time-consuming. Therefore, using data mining, machine learning, and deep learning approaches, it is necessary to analyze the behavior of biomass pyrolysis in terms of feedstock composition and pyrolysis process parameters in order to assess their cumulative influence on the efficiency of BO production. Several unique and advanced methods have been coupled with traditional methods to improve performance with both linear and nonlinear problems as a result of AI (artificial intelligence) advancement [1923]. In comparison to traditional methods, ML (machine learning), a subset of artificial intelligence, and procedures such as random forest (RF), multilinear regression (MLR), decision tree (DT), and support vector machine (SVM) have shown significant performance in biomass pyrolysis due to their high ability to predict the results [24, 25]. On datasets exhibiting a linear relationship between the input variables and the target, linear regression analysis is widely used. Hussain and Mustafa developed a model of linear regression for the production of BO from biomass by fast pyrolysis by correlating retention time, biomass content, and reaction temperature with BO output [26]. The determination coefficients for different models were in the range of 0.81–0.99, according to the findings. At the same time, a linear regression-based methodology was utilized to investigate the relationship between 20 different biomass feedstock samples and the distribution of BO components [27]. The BO components and biomass composition were discovered to have a strong relationship. Although more phenols were produced by woody biomass, more ketones were yielded from straw, more fatty acids were produced by algal biomass, and more furans were yielded from shell biomass. However, linear regression models only consider linear relationships between variables and are ineffective for complex processes that need nonlinear correlations. Furthermore, these models with linear regression were typically developed with a restricted number of empirical results and based on some effective factors, which reduces the model's applicability and reliability.

Thus, it is important to perform a comparative examination of various predictive machine learning models. In the current research, a new machine learning model involving support vector machine hybridized with a novel algorithm called grey wolf optimizer is utilized for the BO yield prediction using the composition of biomass (proximate and ultimate analysis) and conditions of pyrolysis. In this paper, a wide range of experimental input data and various statistical based analyses have been used to evaluate the accuracy of this model. The uniqueness of the proposed model lies in the intriguing trait of model performance independence from outliers.

2. Experimental Database

A sum of 116 experimental biodiesel of output values is gathered to provide a forecasting tool for predicting the hydrogen content values of bio-oil. These database details are accessible elsewhere [20]. For teaching and testing, the dataset of experimental outputs is randomly broken down into two 82 and 34 points datasets for the training and testing phases, respectively. The function of the testing dataset, on the contrary, is to assess the model's generalization or ability to predict unknown data.

3. Model Statement

3.1. SVM

SVM (support vector machine) may be used as a regression method, being referred to as the approach of statistical learning theory regression. The main feature of this approach is that, by utilizing the proper covariance function (F), linear regression is achieved by transferring the inputs from a low-dimensional (D) area to a high-dimensional area. The input data is described as , where and are the output scalar and the scalar m-D input, respectively. The regression of support vector was described as follows [28]:where λ and b indicate regression F's weight vector and deviation word. By minimizing the regularized hazard F, that issue could be changed to an optimization process illustrated as follows:

Vapnic (1995) has established the above-mentioned equation and the equation is famous for the ε-insensitive loss F [29]. The role of ε in the equation is to restrict the regression’s range. It might be observed that if the forecasted and real value deviation is less than ε, loss F would equal 0; contrariwise, the loss is equivalent to the model absolute deviation and ε. The following is the definition of the optimization object:where C is a penalty factor or a regularization parameter and is a slack variable that may be used to adjust the teaching collection of data bias. The present situation may be described as a dual issue. The issue is explained in the following sections:where and αi, respectively, refer to the hyperplane best weight vector and multiplier of Lagrange. The hyperplane F formula is defined as follows:

The final version of regression F is as follows:where refers to covariance F that is specified by scalar product of φ () and φ (). The Gaussian radial basis F, which is employed in this work, is a prevalent type of covariance Fs:where γ is the covariance parameter [30]. It is noteworthy that C, ε, and γ are the SVM model key regressed parameters.

3.2. GWO

Mirjalili proposed GWO, a novel metaheuristic method [31]. This approach, which used a new swarm intelligence methodology, was centered on the haunting behavior of grey wolves and a naturally occurring hierarchical connection. The GWO outperforms other metaheuristic approaches, for example, Particle Swarm Optimization [32], Ant Colony Optimization [33], and Genetic Algorithm [34]. The algorithm of GWO is usually comprised of four various parts: hierarchy, chasing, surrounding, and assaulting.

These wolves are mainly gregarious, as the peak of the food hierarchy. α is considered to be the best answer. Then, β is considered to be the second-best option; likewise, δ specifies the third-best option, and ω denotes the rest of the best solutions. Here, α, β, and δ wolves are in charge of steering the optimization and the other wolves would comply. In the surrounding hunt, the conduct is specified as follows:where denotes the current position vector and refers to the current hunt location. and represent the coefficient vectors, calculated as follows:where ranges from 2 to 0. and are random vectors with values varying between 0 and 1 and ranges accidentally between −α and α. If | | value is less than 1, the prey will be attacked by wolves and the wolves would get the current prey position. In nature, the influence of impediments around the prey might be evaluated in the vector of . This parameter’s random value generates unpredictable prey weights, which might limit local optimal stagnation, particularly during the last rounds. Grey wolves are capable of locating and pursuing the prey. α, β, and δ and wolves of various iterations can lead this process. ω agents' F is to update the position depending on the other three current ideal positions. This part can be defined as follows:

In conclusion, this algorithm begins with several grey wolves randomly generated so that the wolves of α, β, and δ are achieved according to related finesses determination and the likely prey location. In the optimization process, and govern the attack and exploration operations. Finally, once the desired criterion is reached, this process will be terminated.

4. Accuracy Evaluation of Dataset

The precision of applied data is one of the significant subjects in a forecasting appliance preparation; thus, evaluating the dataset's accuracy is vital. As a result, leverage analysis is conducted. The following is an explanation of the hat matrix, an important notion in this method [35]:

The matrix shown above is function, that is, an m × n matrix. The values m and n denote the number of actual data points together with prediction tool parameters, accordingly.

The matrix’s primary diagonal is utilized to calculate each real point’s hat value. William's plot is presented regarding hat values on the x-axis (x-A) and standardized residuals on y-A to better discern outliers from the reliable limit.

The primary diagonal of the matrix is utilized to define each actual point’s hat value. To discern outliers from valid points, William’s plot is presented concerning hat values on x-A and standardized residuals on y-A.

Figure 1 shows that X suspicious points are out of the designated sound zone of . In the preceding figure, a crucial leverage value, denoted by , is also provided, seen as follows:

On the basis of the established zone for outlier detection, it can be said that the majority of the output data points have adequate and reliable validity for the construction of a forecasting tool.

5. Sensitivity Analysis

Sensitivity analysis was performed on the input data to determine the effect of each of them on the target parameter. More details about this method are given elsewhere [36, 37]. Figure 2 shows the results of this analysis for the proposed model. Accordingly, H and O have the most and the least effect on the target parameter, respectively, which have relevancy factors equal to +0.73 and −0.63, respectively.

6. Parameters of Model Evaluation

For quality assessment of agreement between values of estimated and actual output values, the statistical parameters, listed as follows, are used:

7. Modeling Results

The support vector machine method was adjusted and parameterized using the teaching data following grey wolf optimization.

The forecasting tool’s performance assessment is crucial after determining the optimum SVM structure. To that purpose, Figure 3 shows a visual analogy between biodiesel determined and the actual output values for testing and training data collection. One of the common tools for model evaluation is the concurrent representation of model outputs and real output data. As demonstrated in this illustration, the determined and the real target values overlap with each other with a high rate of precision. The proximity of the value of forecasted output values to the actual one proves the model's correctness.

In Figure 4, the actual and anticipated cross-plot of output values is shown for both the teaching and testing stages. By representing actual values versus estimated ones, the cross plot is specified. The precision of the model will be increasingly obvious when the obtained points are closer to the bisector line. Furthermore, in these locations, the fitting line can aid in accurate judgment. As demonstrated, there is a high degree of agreement between biodiesel estimated and real values, by R2 values of 0.9722 and 0.977 for the teaching and testing stages, respectively. These values indicate how well the suggested line fits. Put differently, these fitting line values address the correlation between the expected and actual output values. These findings in both training and testing stages show that this model is qualified for predicting biodiesel characteristics.

Furthermore, Figure 5 depicts the relative divergence of calculated values from true ones. The discrepancy between determined and actual target levels is explained by the relative deviation. For the biodiesel output, these values are accurate to within 10%. A good explanation for the suggested model's accuracy might be deviation’s low value.

A statistical analogy is helpful after a visual comparison. Table 1 provides a concise overview of the previously discussed parameters (MSE, RMSE, MRE, and R2 stated in equations (13)–(16)). These parameters are set to demonstrate the capabilities of this model to reproduce the biodiesels' real output values. In the training stage, R2 = 0.972, MRE = 1.98, MSE = 5.64E-02, and RMSE = 2.37E-01; in the testing stage, R2 = 0.977, MRE = 1.97, MSE = 5.81E-02, and RMSE = 2.41E-01. Because of the training findings, the GWO-SVM algorithm has an excellent performance in accurately predicting the biodiesel output values in the training dataset. They indicate that in this domain the GWO-SVM algorithm is well taught.

Tang et al. used similar data to this paper to estimate the hydrogen contents of bio-oil by two models, MLR and RF [20], and concluded that their models had the ability to predict the target parameter with R2 and RMSE equal to 0.352 and 1.41 and 0.84 and 0.56, respectively.

Together with the training assessment, the model's effectiveness in predicting unobserved output values of biodiesel must be investigated. Based on the findings obtained during the testing stage, it is clear that GWO-SVM has sufficient generality in evaluating the distinct biodiesel target values.

8. Conclusion

Because biodiesel is a clean fuel form for producing energy, the necessity of study on biodiesel qualities is obvious for all researchers and authors working in this subject. A novel prediction technique based on GWO-SVM was created in this study to assess the hydrogen contents of bio-oil as a function of pyrolysis conditions and biomass compositions of feedstock. As previously stated, the uniqueness of this model lies in the intriguing trait of model performance independence from outliers. To check the correctness of the databank, the leverage methodology was employed on output data points first-ever in the writings, and this investigation proved the reliability of the utilized databank. Contrasting model outputs with 116 values of experimental target values yielded R2 = 0.973, MRE = 1.98, MSE = 5.68E-02, and RMSE = 2.41E-01, as well as good visual accord between experimental values and value of GWO-SVM output data. SVM-based model was proved to be the best forecasting tool, as shown by this analysis, with no restrictions in accurately predicting the target values of biodiesel in various operational settings. Furthermore, the effects of various input parameters on output were determined. According to the model and sensitivity analysis results, this research might be useful for scientists working on biodiesel and nature-friendly production challenges. In generating clean fuels, the studied tools are useful for stimulating various processes. As a result, they have the opportunity to support the resolution of global warming issues.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by Major and Special Project of Taizhou Vocational and Technical College under Grant 2020HGZ03.