Abstract

An independence test based on symbolic time series analysis (STSA) is developed. Considering an independent symbolic time series there is a statistic asymptotically distributed as a CHI-2 with degrees of freedom. Size and power experiments for small samples were conducted applying Monte Carlo simulations and comparing the results with BDS and runs test. The introduced test shows a good performance detecting independence in nonlinear and chaotic systems.

1. Introduction

Economics presents a variety of dynamical processes. Linearity, nonlinearity, deterministic chaos, and stochastic models have been applied when modeling a complex reality. Even more as highlighted by Gabr and Fathey [1] the increasing use of time series data has initiated a great deal of research and developing attempts in the field of data mining. Classification of time series data has a wide range of applications and has attracted researches from a wide range of disciplines. Detecting dependence in time series is an essential task for econometricians and applied economists. In particular, as highlighted by Brooks [2], testing for nonlinearity dependence has become a relevant area of research due to its implications for model adequacy and predictability. Moreover, the importance of testing randomness also at a microeconomic level is already asserted by Wald and Wolfowitz [3]. They designed a runs test and remarked that the problem of testing randomness arises frequently in quality control of manufactured products.

As mentioned by Shahwan and Said [4] obtaining accurate stock prices forecast is one of the main goals of finance and academic research institutions. Therefore, finding evidence of nonlinearity in time series means that forecasting can be improved by switching from a linear to a nonlinear model. Furthermore, detecting dependence in the residual of a linear model is considered as evidence of nonaccuracy representation of the data. As Granger et al. [5] assert, even though economics is rich in dynamical processes, most commonly used test statistics are functions of correlation motivated by linear relations involving continuous variables and/or Gaussian process. They remark that numerous diagnostics are used to examine model residuals for departure from “independence,” i.i.d., reversibility, martingale difference, and other properties.

There is extensive literature about testing independence and nonlinearity. Correlation tests are widely applied (see King [6] for a survey), but as highlighted by [7] correlation tests are not consistent against alternatives with zero autocorrelation. ARCH model, bilinear, nonlinear moving average (NLMA) process, and iterative logistic maps are examples of serially dependent processes with zero correlation. The Durbin-Watson (DW) test proposed by Durbin and Watson [8, 9] is by far the most common test for first-order autocorrelated AR errors and extended by [10, 11]. However, DW is a test specialized in linear dependence, having low sensitivity to nonlinear dependent processes (see Azzalini and Bowman [12]).

Reference [13] makes a bibliography of a series of nonparametric tests based on instruments such as runs, signs, ranks, permutations, frequency counts, records, and quotas. The well-known runs test was proposed by Wald and Wolfowitz [3] and is based on the repeated occurrence of the same value or category of a variable such as the sign. Runs test of randomness assumes that the mean and variance are constant and the probability is independent.

Reference [14] conducts a competition among the best of the available tests for nonlinearity and chaos. The one proposed by Hinich [15] has zero power against some forms of nonlinearity. The Lyapunov exponent test suggested by [16] is a test of chaos and it does not detect other types of nonlinearity. White test (see [17]) is a test of nonlinearity under the hypothesis of linearity in the mean. For instance, it correctly accepts linearity in the mean of the ARCH and GARCH processes, even if they are nonlinear processes. The null hypothesis of Kaplan’s test [18] is linearity of the process.

Reference [19] proposes the BDS test to detect chaotic processes. However, it also serves as an independence test or nonlinear test showing a high power against a vast class of nonlinear alternatives.

Entropy has also been applied to test independence. References [2022] suggest measures of serial dependence based on entropy. A normalized smoothed nonparametric entropy measure of serial dependence is proposed by Granger and Liu [23]. Granger et al. [5] introduce transformed metric entropy of dependence. A Kernell-based nonparametric entropy estimator of serial dependence is suggested by Hong and White [24]. Reference [7] constructs a test of independence by using symbolic dynamics and permutation entropy commented and criticized by Elsinger [25].

Reference [26] presents a method based on observable ordinal patterns to discriminate white noise from deterministic time series corrupted with high levels of white noise. Reference [27] introduces a multiscale symbolic information-theory approach for discriminating nonlinear deterministic and stochastic dynamics from time series associated with complex systems. References [2830] suggest a test for distinguishing regular from chaotic dynamics in deterministic dynamical systems. In [31] the authors present a theoretical justification of the 0-1 test for chaos. In [32] the authors applied the 0-1 test to detect numerically the chaos in the proposed fractional order financial system.

Recently, Cánovas et al. [33] have made a comparative study of tests to detect whether a time series comes from an IID random variable or not. They have found that BDS and permutation type tests fail in detecting dependence. According to the authors testing independence is very complicated and, in their opinion, the combination of several tests is necessary.

In the present paper, a new simple and powerful test is proposed based on symbolic time series analysis (STSA). The STSA approach considered in the present work is extensively explained in [3437]. The introduced test shows a good performance detecting dependence in nonlinear and chaotic systems.

The paper is organized as follows. The next section presents the symbolic time series analysis approach. Section 3 proposes and derives the symbolic random test. Size and power tests are conducted in Section 4. Independence and nonlinearity in financial time series are tested in Section 5. Finally, Section 6 draws some conclusions and further line of research.

2. Symbolic Time Series Analysis

As mentioned by Finney et al. [38] the concept of symbolization has its roots in dynamical-systems theory, particularly in the study of nonlinear systems which can exhibit bifurcation and chaos. Besides the computational efficiency, symbolic methods are also robust when noise is present. Williams [39] highlights that symbolic dynamics is a method for studying nonlinear discrete-time systems by taking a previously codified trajectory using sequences of symbols from a finite set also called alphabet. However, as Piccardi [35] remarks symbolic dynamics should be differentiated from symbolic analysis. The former denotes theoretical investigation on dynamical systems. The latter is suggested when data are characterized by low degree of precision. The idea in symbolic analysis is that by discretizing the data with the right partition we obtain a symbolic sequence. This sequence is able to detect the very dynamic of the process when data are highly affected by noise.

Data symbolization implies transforming an original series of measurements into a limited number of discrete symbols. The resulting symbolic series can be analyzed for nonrandom temporal patterns. It means that, given a time series , we study the dependence by translating the problem into a symbolic time series .

Let us consider a time series where is the sample size. Symbolic time series analysis (STSA) approach suggests as first step to take a partition such that the individual occurrence of each symbol is equiprobable with all others. The result is , a symbolized time series. For instance, imagine that is a time series generated by a Gaussian white noise; we can define a discretization of two regions by establishing when takes a value in the first 50% of the density function and in the other case. The new discrete time series of events would be similar to a series generated by tossing a coin. Of course, different discretization could be applied; for example, six equally likely symbols could be interpreted as tossing a die.

Theory calls the set of symbols alphabet and the sequences of length words. As a second step, symbol sequences of different lengths should be computed generating new symbolic time series for each . The first symbolization is the trivial one, the symbolization of the original time series. A second symbolization is applied for and so on. Consider the last example about the symbolized Gaussian variable in two symbols ; the sequence of two consecutive symbols produces four possible events , the sequence of three consecutive symbols produces eight , and so on. Because of an equiprobable partition, the relative frequency of each possible sequence for truly random data will be equal. Following the example, for and we know that each event from {(0,0), (0,1), (1,0), (1,1)} has probability 1/4 and for and we have {(0,0,0), (0,0,1), (0,1,0), (0,1,1), (1,0,0), (1,0,1), (1,1,0), (1,1,1)} with probability 1/8 for each event. In general, for a symbolized random process the probability of each sequence of length is ; let us call these total possible events .

Once the symbolized time series is obtained, the third step implies computing all the relative frequencies for all the computed symbol sequences in the data. Note that in the present case, since the partition is defined by dividing the frequency in equally probable regions, analyzing the frequency for is trivial. The interest of the present work will be for ; it is considering subsequences of two or more symbols.

Since in practice we do not have infinite time series, we have to work with finite sample and the differences between relative frequencies and theoretical probability can be due to finite sample. Therefore it is worthy to obtain statistics in order to test the independence null hypothesis and to analyze the finite sample problem.

3. Symbolic Independence Test

The purpose of the present section is to derive a simple statistic and its asymptotic distribution when the process under study is independent. In particular, the statistic performance is analyzed for finite sample.

Let us consider a finite time series generated by an independent or random process sized . Define a partition in the series in “” equiprobable regions obtaining the symbolized time series where each symbol takes a symbolic value from the alphabet .

Since we want to derive a general statistic for different alphabet sizes and different subsequences lengths , we have to make two considerations.    is the quantity of possible events. That is, , where for the simplest case implies and then the quantity of events is equal to the symbol set size. In practice, we have a finite sample size ; there is no problem for , but when we compute subsequences or time windows of consecutive symbols we lose observations. For example, when we compute the frequency for 2 consecutive symbols, we have a total sample size . In general, we can define the sample size ; again for the trivial case , .

Note that, defining for as the sum of the total events in the time series, we can derive the multidimensional variable being distributed as a multinomial with , , and for all . In the example of tossing a coin we have two symbols for the two events and () and is a distributed binomial with mean and variance . However, for consecutive events of length we have a variable distributed multinomial with mean 1/4 and variance .

As we will see, frequencies of the events should be important in the statistic and the vector of the frequencies could be approximated by a multivariate normal distribution where is and is an idempotent matrix as For convenience we can define the vector variable having a multivariate normal distribution , being the null vector. Then the statistic can be defined as The term in brackets in (2) is a quadratic form in random normal variables. As Mathai and Provost [40] assert, the distribution of quadratic forms in normal variables has been extensively studied by many authors. Various representations of the distribution function have been derived and several different procedures have been given for computing the distribution and preparing appropriate tables. Approximated distributions have been proposed by [4145]. In the present paper the following theorem in [40, page 197] is applied.

Consider the vector distributed as a multivariate normal distribution with mean vector and possibly singular covariance matrix . A quadratic form is distributed as a CHI-2 with degrees of freedom if the following necessary and sufficient conditions are satisfied:(i) and (ii) and .Note that the theorem can be applied in the present case where vector is distributed multivariate normal . In this case is the identity matrix and is symmetric, singular, and idempotent. Since , distributes CHI-2 with degrees of freedom.

Considering that , we obtain that the distribution of the symbolic randomness statistic (SRS) as Note that computing the statistic is very simple. We just have to know the sample size , the symbols , and subsequences or length and compute the frequencies for each event in the time series .

In summary, the test works as follows.

Step 1. Considering time series , compute the empirical distribution and define equiprobable regions according to the quantity of symbols or the alphabet size.

Step 2. According to the partition, translate into , the symbolic time series when .

Step 3. Compute different symbolic time series for different lengths ; remember that the obtained series in Step 2 corresponds to .

Step 4. For each , compute the frequency of the different events for .

Step 5. For each , compute the as shown in (3).

Step 6. Compare the with the CHI-2 with degrees of freedom at 0.05 of significance, under the independence null hypothesis. When is larger than the critical value, we reject the null hypothesis.

As an example consider a time series sized of tossing a coin ( events), the is computed as where and are the times event 1 (cross) and event 2 (pile) appearing in the time series, respectively. In this case should be compared with the critical value from a Chi2 with 1 degree of freedom (alpha = 0.05). Since the process is random, SRS should be less than the critical value. For consecutive symbols of length , we have compared with the critical value 7.81 from a Chi2 with 3 degrees of freedom (alpha = 0.05).

It could be shown that the SRS coincides with the Pearson independence statistic applied to the symbolic time series. Even more, this statistic is related to the Shannon entropy. In order to see the latter, let us consider as the normalized Shannon entropy where for a certain process and for a random process. Substituting the probabilities in for the frequencies corresponding to the symbolic series (), we have The distribution of is known, but the logarithmic function introduces some difficulty to obtain the exact distribution of . However, we can take a linear approximation when the variable is less than 1 in absolute value. In the present case, we have defined the multinomial distributed vector substituting for for all in , we have Note that , its mean is 0, and the variance is which is decreasing with the sample size and the number of events. Given that , a linear approximation for of   and then where . Equation (6) develops when the expressions mentioned before are introduced: Note that since and multiplying and dividing by the is expressed in

4. Size and Power of the Symbolic Randomness Test

Once the statistic and its asymptotic distribution were obtained, we can proceed to study the performance in finite small samples and the power detecting different forms of dependency (stochastic and deterministic).

As shown, is asymptotically distributed as a CHI-2 with degrees of freedom, where is the number of symbols and represents the subsequence length of consecutives symbols in the time series.

In practice, the SRS can be applied to residual series of a fitted model to detect independence or nonlinearity, similar to the BDS test. For instance, to test nonlinearity it can be applied to the residuals of a fitted linear model such as an ARMA model.

Experiments were conducted in order to show the size and power of the introduced SRS test. The BDS test was applied in order to compare the results. As known, BDS test depends on two parameters: embedding dimension and epsilon . We applied combinations of and suggested by Kanzler [46] who after conducting a series experiments obtains that and appear to give the best approximation. Liu et al. [47] conduct experiments of size and power obtaining a combination of and around 0.26 as the best size. In addition, critical values simulated for small sample by Kanzler [46] were applied.

The following experiment was conducted to study the SRS size. 10,000 Monte Carlo simulations were conducted for time series of a pseudorandom Gaussian i.i.d. (0,1) for different sample sizes (, , , and ). Note that, at first, our interest is the performance in small sample as is common in economic time series. The BDS test, runs test and the present SRS test were applied considering critical values with significance levels , , and . Therefore we computed the percent of rejection of the independence null hypothesis over the 10,000 Monte Carlo simulations for each test, sample size, and significance level. If the critical values are unbiased, the percentage of rejection of null hypothesis of independence when the process is truly independent should be near the significance levels. It means that for the percentage of null hypothesis rejection should be near 5% for a truly random process.

Table 1 shows the percentage of the null hypothesis rejection for a random process. Note that the columns indicate the different applied symbols (2, 3, 4, and 5) and the significance levels ( = 1%, 5%, and 10%). The rows show the different sample sizes ( = 50,200,500,2000) and lengths (2, 3, 4, and 5). Two general comments can be mentioned: for a time series size less than 2,000 it is not advisable to apply more than 4 symbols; the smaller the sample size, the smaller the number of symbols and length than should be applied.

In general the test seems to be conservative rejecting the null hypothesis less times than expected. Being a conservative test, it should be contrasted with its power detecting independence in nonlinear and chaotic systems as will be shown later.

Considering the four sample sizes, selecting 2 symbols and length of 4 presents decent results in most of the cases.

Selecting 3 symbols seems to be a relative good option for size of 200 or larger and 4 symbols for a sample size of 500 or larger.

There is not a best measure of the deviation from alpha in practice. However, note that, from the set of results presented in Table 1, the best results are given for a sample of 2,000 applying 3 symbols and length of 4 (SRS(3,4)) and for 2 symbols and length of 5 (SRS(2,5)). In the case of SRS(3,4) the results are conservative since the percentages of rejections (1.14%, 4.19%, and 6.94%) are the best deviation from below 1%, 5%, and 10%. On the other hand, SRS(2,5) shows the best deviation (2.41, 5.79, and 8.64) from above. We consider that combination showing the best results from below (SRS(3,4)) is better because it reduces the risk of accepting nonrandom process when the process is truly random. This will be important when power tests detecting different processes were conducted.

To obtain comparable results, the same Monte Carlo simulations applied to the SRS test were considered when computing the BDS test and runs test. Table 2 shows the percentage of null hypothesis rejection for the runs and the BDS tests. Note that the columns indicate different significant levels ( = 1%, 5%, 10%) for different sample sizes ( = 50, 200, 500 and 2000). The rows indicate the different tests, three versions of BDS test and the runs test. The results suggest that both tests are also conservative.

In the case of the BDS test three suggested combinations of and were applied as mentioned before. In this case for a sample size of 50 the simulated critical values for small samples conducted by Kanzler [46] present the best results. For a sample size of 200, simulated critical values suggested by Kanzler [46] are the best. Parameters suggested by Liu et al. [47] also indicate acceptable results. For samples of 500 and 2,000 parameters suggested by Kanzler [46] and his simulated critical values for small samples present results near the significance levels.

Although runs test is also conservative, the rejection percentages are nearer to the expected. Note also that as the sample size increases, rejection percentages are better.

The next experiment of power will try to show how powerful these tests are detecting independence in nonlinear and chaotic systems.

At this point some generator processes suggested by the literature were considered (see, e.g., [7, 12, 14, 25, 47, 48]). The next 20 generator processes were applied.(1)Normal. Random process generated by a Normal(0,1).(2)CHI-2(4). Random process generated by a Chi2 with 4 degrees of freedom.(3)-Student(4). Random process generated by a -student.(4)Truncated Normal Distribution. Random process generated by a truncated normal distribution (0,1) at the range .(5)Beta(1/2, 1/2). Random process generated by a Beta distribution(1/2, 1/2).(6)Uniform(0,1). Random process generated by a Uniform(0,1).(7)AR(1). Consider  .(8)MA(2). Consider  .(9)Logistic. Consider  .(10)Henon. Consider   and , with initial conditions generated randomly by and . Time series is considered in the study.(11)Anosov. Consider   and , with initial conditions and generated randomly. Time series is considered in the study.(12)Lorenz. Consider   and , with initial conditions and generated randomly. This is a discrete version of the Lorenz process and time series is considered in the study.(13)TAR (Threshold Autoregressive). Consider   for and for .(14)NLSIGN (Nonlinear Sign). Consider  .(15)Bilinear. Consider  .(16)NLAR (Autoregressive Nonlinear). Consider  .(17)NLMA (Nonlinear Moving Average). Consider  .(18)BLMA (Bilinear Moving Average). Consider  .(19)Modular. Consider  .(20)EGARCH. Consider   when or for and is an exponential GARCH.

Processes (1)–(6) indicate different distribution functions; (7)-(8) are two linear stationary processes; (9)–(12) are deterministic chaotic processes; (13)–(20) are nonlinear stochastic processes. As suggested, the tests were applied to the raw series in the cases of (1)–(6) and the chaotic processes (9)–(12). However, the tests are conducted on the residual of autoregressive processes in the cases of (7)-(8) and (13)–(20) to eliminate the linear dependence. These processes were produced using MatLab R2010a.

Tables 3 to 6 show the percentage of null hypothesis rejection for the different sample sizes. The first five rows are the results for the random processes. Since the results of these five models work as a size test for an = 5% and a conservative approach will be considered, a percentage of rejection equal to or less than 5% will be considered as a good result. For the nonlinear model we will follow the approach by Liu et al. [47]; a rejection percentage larger than 90% when the process is not independent suggests a very good power, and a percentage less than 50% means that the test does not easily detect nonlinearity.

Table 3 shows the results of the power experiment when a very small sample size is considered and computing 5,000 Monte Carlo simulations. SRS test presents the best results considering 3 symbols and a length of 2, SRS(3,2). The five processes generated by the random distribution functions show a percentage of rejection less than 5% showing the conservative character of the test. However, consider that it is able to detect three of the four deterministic chaotic processes with a percentage of rejection larger than 95% for Lorenz, Logistic, and Henon. The BDS test performance is worse than SRS. Note that simulated critical values for small samples have good performance for three of the five random distributions, but it is not able to detect truncated normal distribution, Uniform, and Beta(1/2,1/2) for this sample size. BDS test is able to detect two of the chaotic processes. On the other hand, runs test has the best performance detecting the five random distributions, but it just detects one of the four chaotic processes.

In summary, no test with a good size is able to detect the stochastic nonlinear processes when the sample size is 50 with a percentage larger than 50%. SRS(3,2) detects 11/20 processes, BDS 5/20, and runs test 8/20.

Table 4 shows the results for the experiment when . As expected with a sample larger than 50 the results improve. For the SRS the best results are obtained with 3 symbols and a length of 3. The six random processes and the residuals of AR and MA are detected by the test. Four of five chaotic processes are correctly rejected at 100% as random. The Anosov process seems to be the more difficult process to be detected. In the case the nonlinear stochastic process detection of dependence improves, SRS is able to detect nonlinearity in Bilinear, BLMA, Modular, and exponential GARCH at a rate larger than 60%. The BDS still has problem to detect some random processes; truncated normal, Uniform, and Beta(1/2,1/2) are rejected at a percentage larger than 7%. As in the case of SRS, it is difficult to detect the Anosov process. The Logistic process is detected with a rate of 64% in the case of BDS, but SRS detects at 100%. The BDS detects the same nonlinear stochastic process as SRS but with a rate larger than 80%. Runs test presents the worst performance; even if it is able to detect the six random processes, it is not able to detect chaotic processes such as Logistic and Anosov. Moreover, runs test is able to detect just one nonlinear stochastic process.

In summary, for a sample size of 200 with a good size, SRS detects 15/20 processes, BDS 12/20, and runs test 11/20.

Table 5 shows the power results for a size of 500. Even if SRS for 3 symbols and a length of 3 still presents good results, applying 4 symbols and a length of 3 improves the results. Remember that a larger symbolization has acceptable results if the sample is large. In this case more processes are detected. Note that the chaotic Anosov process is rejected at 93.46% as a random process, and six over eight nonlinear stochastic processes are detected. BDS test applying the parameters suggested by Liu et al. [47] is finally able to detect all the random process but still has problem detecting nonlinearity in the chaotic Anosov process. In addition, it is able to detect dependence in four over eight nonlinear stochastic processes. Runs test still has problem detecting chaotic Logistic model and is able to detect three over eight nonlinear stochastic processes. NLSIGN and NLAR are the more difficult processes to be detected by the three tests at the sample size of 500. Note also that Anosov is only detected by BDS applying a symbolization of four symbols at a rate of 93.46%. The other tests are able to detect the process less than 12% of the time.

In summary, for a sample size of 500 with a good size, SRS is able to detect 18/20 processes, BDS 15/20, and runs test 13/20.

Finally, Table 6 shows the results for a sample size of 2,000. BDS test presents the best results when considering 4 symbols and a length of 3. However, other symbolizations present good results, such as 3 symbols and or 4. In the present case the only process that is not detected by SRS is the nonlinearity generated by the NLAR; maybe this process requires a larger sample size. The BDS test cannot detect nonlinearity in 4 processes: chaotic Anosov and the stochastic processes NLSIGN, NLAR, and NLMA. The runs test cannot detect five nonlinear processes: Logistic, Anosov, Bilinear, NLAR and NLMA. However, note that it is able to detect NLSIGN which is not detected by BDS. The proposed SRS is the only one able to detect chaotic Anosov and nonlinear process NLMA when . In summary, for with a good size the SRS test is able to detect 19/20, the BDS test 15/20, and the runs test 12/20.

5. Detecting Independence and Nonlinearity in Financial Time Series

As asserted by Brooks [2] testing for nonlinear dependence is important in financial econometrics due to its profound implications for model adequacy, market efficiency, and predictability. For instance, Shahwan and Said [4] find that modelization with artificial neural networks (ANN) is more relevant to fit a high-dimensional chaotic process than the Bayesian and ARIMA methods. They mention that most recent empirical work implies that the presence of low-dimensional deterministic chaos increases the complexity of the financial time series behavior. Therefore it is important in financial time series to determine the existence of dependence.

The test is applied to financial time series studying the performance in practice. Four asset prices from the New York Stock Exchange, six stock indices, and five exchange rates were considered at different frequencies (daily, weekly, and monthly). Table 7 summarizes some statistics related to the returns of the series.

Test of normality proposed by Jarque and Bera [49] is applied to the returns and null hypothesis of normality is widely rejected for almost all the series. The exchange rate between US dollar and euro at a monthly level is not detected. The latter could be related to the small sample size .

The next step is to apply the different tests to the raw returns of financial time series and to the residuals of a GARCH model. Considering the results obtained in the size and power experiment and the sample size of the financial series, the SRS(2,4), SRS(3,3), SRS(3,4), and SRS(4,3) were selected. In addition, BDS test and runs test are applied in order to compare the results.

Table 8 shows the results of the tests for the different financial returns. Note that the BDS test rejects randomness for almost all the cases but the NIKKEI index at a monthly frequency and the exchange rate between US dollar and euro for a monthly frequency. The SRS rejects randomness in less cases than the BDS test, in particular for the monthly frequency, maybe due to considering few data. This is clear for Coca Cola, IBM, Caterpillar, S&P 500, FTSE, DAX, and NIKKEI. Note that Coca Cola for a daily frequency and the exchange rate between dollar and yuan highlight a strong dependency. In the case of the exchange rate between US dollar and Chinese yuan the economic monetary policy should be considered. China has controlled the exchange rate and just in the last years it has introduced some flexibility. In the same way, the SHANGHAI index presents the largest degree of dependency and it is due to controls and constraints that have been presented in the Chinese stock market. The runs test rejects only 16 over 40 time series as generated by an independent process.

Table 9 shows the results for the different tests after applying a GARCH(1,1) model to the financial returns. The runs test does not reject independence in 28 over 40 cases. BDS test does not reject independence in residuals in 26 over 40 cases. However, note that the conservative SRS test in this case only rejects independence in 13 over 40 cases. This indicates that after applying a GARCH model to the financial series SRS still detects some dependence or nonlinearity. As an example, after applying GARCH(1,1) to the S&P 500 at different frequencies (daily, weekly, and monthly) the BDS test does not reject the fact that the residuals are independent and so GARCH(1,1) is a good model. However, the SRS test still detects some dependence in residuals and GARCH(1,1) is not the best model.

6. Conclusions

In the present study a new independence test is derived; it can be also applied to detect nonlinearity in time series. The test has the advantage of being easy to compute and it is powerful in detecting dependence generated by different nonlinearity and chaotic systems. The introduced test is based on the symbolic time series analysis usually considered when studying series highly contaminated by noise.

There are many tests of independence, randomness, and nonlinearity in the literature and the topic is a growing area of investigation due to the necessity generated by applied economics and statistics.

The present paper suggests that the test can be applied to detect nonlinear dependence in time series through transforming the series to a symbolic one and computing a statistic that is asymptotically distributed as a Chi-square with degrees of freedom, where is the quantity of possible events. Moreover, the test is related to the Pearson independence test and it is connected with the Shannon entropy widely studied in information theory when measuring uncertainty in data.

Size and power experiments were conducted for small samples and the results are compared with the well-known BDS test and runs test. Experiments indicate that for a sample size less than 2,000 it is not advisable to apply more than four symbols. The three tests are conservative even if SRS test seems to be the most conservative. The test power detecting randomness and independence in nonlinear and chaotic systems was studied. Results suggested that even if the test is conservative it presented the best performance detecting the 20 different processes. In very small samples (about 50 observations) SRS test was able to detect dependence in most of the chaotic processes. In particular it was the only one identifying nonlinearity generated by the very simple logistic process. The SRS test is able to detect the chaotic process of Anosov and the stochastic model NLMA for a sample size of 2,000, whereas no other test can do it. It is also observed that BDS and runs test present different performances, even if the latter is better. On the one hand, NLSING was detected by the runs test but not by BDS; on the other hand Bilinear was detected by BDS test and runs test cannot identify this process; SRS test was able to detect both of the models. It should be mentioned that for a sample size of 2,000 no test detected the NLAR process. It is important to remark that the objective was to study the performance in small sample size as is common to find in economics. However, note that all tests performance improves as the sample size increases.

The SRS test was applied to financial series such as stock asset, stock indices, and exchange rates, considering different frequencies (daily, weekly, and monthly). The test was applied to the raw returns and to the residuals of a GARCH(1,1). Comparing with other tests it was noted that the SRS test rejected independence less times than the BDS in the raw returns case, in particular when there are monthly data or few data. However, when the tests are applied to the residuals of a GARCH(1,1), the results change. In this case the BDS rejected the independence few times whereas the SRS test still detects nonlinearity in the residuals. This suggests that BDS considers that the GARCH(1,1) model is a good model most of the time but the SRS test would suggest that GARCH(1,1) is not a good model considering all the nonlinear components.

In applied economics it is generally worked with finite samples. For this reason it is necessary to design a more powerful test detecting nonlinearity or dependence in small samples. In this sense, the symbolic time series analysis can be an approach to go further. A future research line is to develop the test for multidimensional time series. Symbolization permits transforming a time series with many dimensions to a one dimensional series permitting the simplification of the analysis.

Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.