Abstract

Method comparison studies mainly focus on determining if the two methods of measuring a continuous variable are agreeable enough to be used interchangeably. Typically, a standard mixed-effects model uses to model the method comparison data that assume normality for both random effects and errors. However, these assumptions are frequently violated in practice due to the skewness and heavy tails. In particular, the biases of the methods may vary with the extent of measurement. Thus, we propose a methodology for method comparison data to deal with these issues in the context of the measurement error model (MEM) that assumes a skew- (ST) distribution for the true covariates and centered Student’s (cT) distribution for the errors with known error variances, named STcT-MEM. An expectation conditional maximization (ECM) algorithm is used to compute the maximum likelihood (ML) estimates. The simulation study is performed to validate the proposed methodology. This methodology is illustrated by analyzing gold particle data and then compared with the standard measurement error model (SMEM). The likelihood ratio (LR) test is used to identify the most appropriate model among the above models. In addition, the total deviation index (TDI) and concordance correlation coefficient (CCC) were used to check the agreement between the methods. The findings suggest that our proposed framework for analyzing unreplicated method comparison data with asymmetry and heavy tails works effectively for modest and large samples.

1. Introduction

Evaluation of the two methods for measuring a continuous response variable attracts greater attention in health science such as biomedical engineering, clinical research, and medical imaging. The methods may include an assay, medical device, clinical observer, and measurement technique, and the variables of interest, e.g., blood pressure, heart rate, level of cholesterol, and the concentration of the chemical. Generally, the new methods are compared with already established methods to identify sufficient agreement between them. With so many advancements in the field of medical sciences, new measurement methods and techniques are available that may be cheaper, faster, easier to use, and less invasive. Before using these new measurement methods, the accuracy and precision must be confirmed. Therefore, detailed research in this sector will enable health professionals to choose the most appropriate and effective treatment method. Suppose the study reveals a satisfactory agreement between the methods, it could either be used interchangeably or the most appropriate method is selected. These studies are widespread in the research of health sciences. The Web of Knowledge citation database now has over 50,000 citations for Bland and Altman [1]. They proposed the limits of agreement methodology for evaluating the agreement between two methods, which is a testimony to the above.

In method comparison studies, each method generally takes measurements on each subject. At times the measurements may be replicated. The data from the same subject are considered dependent, but data from different subjects are considered independent. A two-step technique may be used to analyze these data. Modeling the method comparison data is the first step. For this purpose, the mixed-effects model [2] is commonly used and assumes an independent normal distribution for both random effects and errors when the variability of the measurement remains constant over the entire measurement range [39]. The second step is the agreement evaluation between the methods. The agreement evaluation is performed on one or more measures of agreement that indicate how much of these methods agree with themselves. Slight differences in measurements refer to a good agreement between the two methods. In the literature, there are several agreement measures available for evaluating the agreement between two methods, including limits of agreement [1, 3, 4, 10], CCC [1114], TDI [9], coverage probability (CP) or tolerance interval [1216], mean squared deviation (MSD) [17, 18], and coefficient of individual agreement [11, 17].

A linear mixed-effects model is typically employed when comparing a novel approach to an existing reference or standard. However, it is important to note that this method cannot be used when the biases of the methods vary with the magnitude of measurement [3, 1923]. In the case of the above-nature data, a MEM [24] should be used instead of a mixed-effects model. In the literature, the majority of the studies mentioned above are under the normal distribution. However, method comparison data frequently exhibit skewness and heavy tails in practice, meaning tails that are longer than those of a normal distribution, illustrated by analyzing a real dataset based on a method comparison study by Tomaya and de Castro [25] and is discussed later in this article. For this scenario, data transformation is feasible to ensure that the normality assumption is met. Nonetheless, it may make it difficult to interpret the differences in measurements between the two methods. It is a common issue in method comparison data analysis. To overcome this problem, some alternative approaches have been considered [2634].

Recently, Choudhary et al. [35] developed a general skew- (GST) mixed model that assumes an ST distribution for the random effects and an independent multivariate distribution for the errors. Later, Sengupta et al. [36] developed this methodology to analyze the method comparison data with skewness and heavy tails with unknown error variances. Here, they have developed a methodology to assess how well the methods agree when measuring in the same nominal unit. This means that the true (error-free) values of the method may differ only by a constant. The above models cannot be used when the methods have different measurement scales/methods. When collecting data, various measuring scales might lead to measurement errors in covariates and response variables. There would be some change in the statistical inferences if these errors were not taken into consideration. As a result, compared to a mixed-effects model, the MEM provides a more adaptable framework for modeling method comparison data.

The study of method comparison under heavy-tailed distributions has not received much attention in the literature because of the complexity of the likelihood function. Recently, Cao et al. [37] proposed MEM for replicated data under asymmetric and heavy-tailed distributions with the same degrees of freedom for true covariate and error terms. Further, this model is unable to account for different degrees of heaviness in the tails of true covariates and error distributions. Further, Tomaya and de Castro [25] developed STcT-MEM that assumes an ST distribution for the true covariate and a cT distribution [38] for the error terms with known error variances and considered the different degrees of freedom for true covariate and errors. In this paper, our main goal is to adapt the above model to unreplicated method comparison data with different levels of heaviness in the tails of true covariates and errors, especially if we know the error variances. This approach will enable us to model the method comparison data with better flexibility and higher accuracy, accommodating skewness and heavy tails.

The rest of the paper is set out as follows. In Section 2, we present the STcT-MEM for method comparison data. Section 3 deals with the proposed methodology for the evaluation of the agreement under STcT-MEM. Section 4 explores the performance of the proposed model using simulation studies. Section 5 provides an application utilizing data on gold particles to illustrate our methodology, and the final section discusses the findings and conclusions. All calculations presented in this paper were carried out using the R programming language [39].

2. Modeling of Method Comparison Data

This section outlines an approach for analyzing studies that compare two methods that use single measurements on each subject, implying that the measurements are not replicated. The measurement of the jth method on the ith subject is denoted by Here, is the number of subjects in the study. The standard method is assumed to be Method 1, and the test method is supposed to be Method 2.

2.1. An Overview of STcT-MEM

This section briefly describes the STcT-MEM in general terms before being presented for method comparison data. The details can be found in Tomaya and de Castro [25]. In this article, we use boldface letters to refer to vectors and matrices. Let , and , respectively, denote -dimensional normal, skew-normal, and ST distributions. Here, is a location vector, is a positive-definite scale matrix, is a vector of skewness parameters, and is the degrees of freedom. Let denote the gamma distribution with parameters . Furthermore, we use to denote the square root of a symmetric, positive-definite matrix so that  = Σ, where is the transpose of , is the inverse of , and is the determinant of

Let denote a -variate cT distribution if its probability density function (pdf) is given bywhere is a mean vector, is a covariance matrix, is the normalizing constant given by , and (·) denotes the gamma function. It is a centered parametric version of the Student’s distribution, where the parameters are the mean vector and covariance matrix, whereas, in the usual parameterization, they are the location vector and the scale matrix. If the model assumes that the variances of the errors are known, the distribution that would best support this variances assumption is the centered version. A brief introduction about can be seen in Appendix A, and detailed information about these distributions can be found in Azzalini and Capitanio [40] and Azzalini and Capitanio [33].

The SMEM can be written aswhere and are the intercept and slope parameters, respectively, and are unobserved true covariate and unobserved true response variables, respectively, and are the observed variables, and are the error terms, and is the sample size. Model (2) can be written aswhere , and . It is standard to assume that and are independent and

Normality assumption is sometimes unfeasible due to the skewness, heavy-tailed ness, and outliers. Thus, Tomaya and de Castro [25] developed the STcT-MEM with more general distributions as follows:where and are mutually independent. Inverse transformations have been considered for the degrees of freedom to enhance the inference process. The hierarchical representations of and are defined in Appendix B. Next, is reparameterized as , where

Then, the mean vector and covariance matrix of arewhere with

Since the pdf of Zi is not in a closed form, the one-dimensional numerical integration is used to solve this issue, which is explained in Tomaya and de Castro [25] and Choudhary et al. [35]. This can be carried out by using the numDeriv package [41] in R. Furthermore, due to the complexity of the log-likelihood function, Tomaya and de Castro [25] used the ECM algorithm [42] to estimate the parameters. It is a variant of the expectation-maximization (EM) algorithm.

2.2. STcT-MEM for Method Comparison Data

It follows from (3) that the model for the paired measurements can be written aswhere and are fixed regression coefficients known as fixed bias and proportional bias of method 2, respectively, denote the true unobservable measurement for the th subject, and is the random error of the jth method on the ith subject. The methods are scaled differently in this case, and the methods have the same scale if In this case, the model reduces to a mixed-effects model discussed by Sengupta et al. [36].

Further, and are mutually independent, and we assume thatwhere  =  and assumed as known.

The SMEM becomes a special case of the STcT-MEM (5) when the skewness parameter , and the degree of freedom parameters .

3. Evaluation of Agreement

The evaluation of agreement in a method comparison study examines the joint distribution of the method, and the evaluation of similarity is a comparison of the marginal characteristics of the measurement methods, such as their biases and precisions. Let indicate a pair of observations taken using two methods on a subject selected at random from a target population. The closeness of the two methods’ measurements is referred to as agreement. When the methods have equal means, variances, and correlation one, they have a perfect agreement. Here, the bivariate distributions of and are concentrated on the 45° line.

To quantify the extent of agreement, we first determine how far apart the paired measures are from the line of equality. This is performed through measures of agreement. Several agreement measures can be found in the literature, such as limits of agreement, CCC, TDI, and MSD. Here, we consider only CCC and TDI to evaluate the agreement between the methods. The CCC was proposed by Lin [12], and it is defined as

The CCC ranges from −1 to +1. A large positive value of CCC indicates good agreement. The value of 1 implies perfect positive agreement, and the value of −1 represents perfect negative agreement. Detailed information on this measure can be found in Barnhart et al. [11] and Carrasco and Jover [5]. The TDI was proposed by Lin [18], and it is defined as

Generally, varies from 0.80 to 0.95. It is nonnegative, and a small value indicates high agreement between the methods, and it is perfect when TDI = 0. It has been used by Lin [18] and Choudhary [9, 15]. The TDI can be calculated by solving the following equation:

We often employ one-sided confidence intervals for agreement measurements to evaluate the agreement. It is possible to choose a lower or upper confidence limit. We can compute the upper confidence limit, where a small value (nearing zero) for an agreement measure that indicates good agreement, for example, TDI. Similarly, we may compute a lower confidence limit, such as CCC, where a large positive value (nearing one) for an agreement measure indicates good agreement [43]. Let denote the observed information matrix [14, 35, 43] of (model parameter vector) evaluated at ML estimates. Note that is the likelihood function. It can be computed using numerical differentiation techniques. When is large, can be approximated by a normal distribution with mean and variance , according to the large sample theory. Next, let be a scalar measure of agreement between the two methods. Its ML estimator is obtained by substituting with . From the large sample theory, the sampling distribution of can be approximated aswhere is the Jacobian matrix evaluated at .

Then,where is the critical point and These confidence limits are generated by applying Fisher’s -transformation to CCC and the log transformation to TDI, then inverting the findings back to the original scale for greater accuracy in the estimate.

3.1. CCC and TDI under STcT-MEM

To define CCC and TDI, first, note that the hierarchical representation of from Appendix B iswhere , and are counterparts of and from Appendix B, and denotes gamma distribution with parameters .

The mean and variance of can be represented as where .where

Let and . As a result of Appendix C,where and .

Now, the STcT version of CCC can be defined from (10) aswhere .

Next, using (11), the TDI for STcT-MEM can be defined aswhere is the distribution function of the and is the joint density of appearing in (15).

3.2. CCC and TDI under SMEM

Under the SMEM (4),

Next, the difference can be represented as

Now, the SMEM version of CCC can be defined from (10) as

Moreover, TDI under SMEM defined by (11) can be determined aswhere denotes the cumulative distribution function (CDF) of a standard normal distribution.

4. Simulation Study

In this section, the Monte-Carlo simulation study is conducted to examine the behaviour of the ML estimators using the ECM algorithm based on STcT-MEM and SMEM. We generate 500 datasets from the STcT-MEM with sample sizes 25, 50, and 100. The skewness parameter in STcT-MEM is set as follows: λ =2.5, 5 and 10. Other parameters in STcT-MEM are set as follows: , derived from the gold particles data set. Here, we considered the inverse of the degrees of freedom to enhance the inference process, and we kept them constant throughout the simulation to save computing time.

For each sample size, the variances of the measurement errors are picked from uniform distributions on (0.004 and 0.008) and (0.001 and 0.003), respectively, and then assumed as known values. Based on 500 random samples, we compute the ML estimators and their CPs with the nominal 95% confidence intervals through the ECM algorithm under STcT-MEM. Then, we calculate the sample bias (BIAS), the standard deviation (SD), and the root mean squared error (RMSE) as assessments for the estimates based on STcT-MEM and SMEM under the simulated dataset generated by STcT-MEM. The results are summarized in Tables 1 and 2, respectively. The R programming language was used for all calculations [39].

Table 1 shows the ML estimates, asymptotic standard errors (SEs), and CPs of 95% confidence intervals. Concerning and , the CPs are fairly close to 99%, even for small, moderate, and large sample sizes. In the case of other parameters, most entries are close to 95%, and some even fall below 90% for moderate and large samples, and the values for the small samples are not accurate. However, the CPs increase when the sample size increases for all cases, and it can also be seen that the CPs have good performance when the skewness is moderate or heavy ( 5 or 10).

Table 2 shows that when the sample size and skewness () increase, the values of BIAS, SD, and RMSE decrease, as expected. The BIAS, SD, and RMSE values under STcT-MEM are small for all cases, revealing the efficiency and accuracy of the ML estimates, and for all settings, the values of SD and RMSE are nearly equal in STcT-MEM. In SMEM, the biases of the estimates of and are not negligible. Moreover, for all cases, the bias, SD, and RMSE of the ML estimates based on STcT-MEM are smaller than those of the SMEM estimates. Thus, the performance of the ST distribution is better than that of the normal distribution, which may be due to their heavy-tailed characteristics.

For the model comparison, we compute their relative efficiency by dividing the MSE of SMEM by the MSE of STcT-MEM, and if it is greater than one, it means STcT-MEM is better. These values are displayed in Table 3, and it can be observed that the relative efficiencies increase with sample size in all situations. Moreover, the relative efficiencies improve as increases. Furthermore, all entries are greater than 1, which indicates that STcT-MEM is better than SMEM for skew and heavy-tailed data. Additionally, the results of the Akaike information criterion (AIC) and Bayesian information criterion (BIC) values based on the STcT-MEM and SMEM when the data generating model is STcT-MEM are displayed in Table 3. We find from Table 3 that AIC and BIC values under STcT-MEM are smaller than the standard model (SMEM). It means STcT-MEM performs better than SMEM for skewed data.

5. Data Analysis

This part considers gold particle data [25] as a numerical example. This data set investigates the concentration of the gold particles (in gt−1) using Classical and Screen Fire Assay (SFA) methods. The measurements during this study are not replicated. There are 501 × 2 = 1002 observations, which are paired and starting from 0.038 to 4.523 gt−1. Since these measurements were made in a chemical lab, where a variety of variables, including the operator and the subject’s location, may affect the outcomes, they are prone to mistakes, and these errors are inevitable. As a result, the proposed model is applicable. Furthermore, as required by our methodology, these measurement errors have known error variances. It was calculated using the formulas and [44].

Figure 1 shows the histograms and normal Q-Q plots of gold particle data. It is observed that the data are asymmetric and heavy-tailed. Figure 2 illustrates their trellis plot. It shows that the measures of both methods do not overlap, and the measures of the SFA often have the greatest ones. Some subjects have disproportionately big differences, implying a skewed distribution of disparities. Furthermore, it shows that the within-subject variations of both methods tend to increase with the concentration level. This means that the data are heteroscedastic.

Moreover, Figure 3 shows the scatter plot and Bland–Altman plot of these data. The scatter plot shows a modest correlation between the methods. The Bland–Altman plot shows that the vertical scatter appears to rise with average, which indicates heteroscedasticity. All the above plots show two extreme outliers. In this case, we performed the analysis by replacing the outlier with the mean value.

At the outset, we fit the mixed-effects model to the data. Figure 4 depicts the normal Q-Q plot of standardized residuals and random effects. The box plot and histogram of standardized residuals are also presented. These graphs appear skewness and heavy-tailed ness, suggesting that the assumption of normality is inadequate for error terms and random effects. Thus, we use the proposed STcT-MEM model to fit the data, where follows the ST distribution and follows a cT distribution. SMEM is also taken into account simultaneously for comparative purposes.

Firstly, we fit the STcT-MEM (9) by ML using the ECM algorithm, where and are the gold particles measurements taken from the Classical and SFA, respectively, on the subject, . Here, the degrees of freedom are assumed to be known parameters determined by the Schwarz information criteria [45], and the error variances are also assumed as known. This model has five parameters. Secondly, we fit the SMEM model (4) when and by ML using the ECM algorithm, and it has four parameters.

Table 4 shows the parameter estimates, SEs, and 95% confidence intervals for these parameters. The 95% confidence intervals for intercept and slope are (0.051 and 0.062) and (0.835 and 0.905), respectively. In the SMEM, they are (0.057 and 0.067) and (0.816 and 0.874), respectively. Neither of the intercept intervals covers zero in both models, indicating considerable fixed biases. Likewise, neither of the slope intervals covers one, despite one being near the right border in both intervals. It provides evidence of moderately proportional biases.

The next step is to evaluate the agreement between the methods. Table 5 shows CCC and TDI (0.90) estimates and 95% one-sided confidence limits for both models discussed in section 3. The lower bound applies to CCC, and the upper bound applies to TDI. These measures are first computed using Fisher’s transformation of CCC and log transformation of TDI, and then the results are inverted back to the original scale. In STcT-MEM, the estimate of 0.940 and the lower bound of 0.932 for CCC imply a higher agreement between the methods. Further, the estimate and upper bound for TDI (0.90) are 0.123 and 0.135, respectively. The TDI bound reveals that 90% of the discrepancy between Classical and SFA measurements is within ±0.135, with 95% confidence. Since the readings range between 0.03 and 3, the difference of 0.135 cannot be acceptable when the real value is 0.03, but it may be acceptable when the real value is 3. Thus, we can conclude that the tests exhibit good agreement for large values but not for small values. Focusing on the SMEM, the CCC estimate and lower bound are 0.938 and 0.930, respectively. Moreover, the estimate of TDI (0.90) is 0.174, and its 95% upper confidence bound is 0.180. Compared to the STcT counterpart, the CCC bound has not changed substantially, but the TDI bound has changed to 0.180. This value suggests that 90% of the differences in measurements from the methods fall within ±0.180. Such differences are quite large compared to the STcT-MEM values. From this, we can conclude that STcT-MEM shows a satisfactory agreement between methods for large values than SMEM.

Additionally, we perform the LR test where the null hypothesis : SMEM model is preferable against the alternative hypothesis and STcT-MEM model is preferable since both models are nested. The test statistic of LR is under the null hypothesis that follows distribution, where and are the log-likelihood functions assessed at ML estimates using the ECM algorithm based on STcT-MEM and SMEM, respectively. The value = 0 was obtained by applying the LR test, which is less than 0.05. Thus, STcT-MEM is better than SMEM for the gold particle data. Furthermore, the AIC and BIC values are also included in Table 5, demonstrating that STcT-MEM performs well over the SMEM.

6. Conclusion

This article presents a methodology for method comparison data based on the distributions of ST and cT, called STcT-MEM, which provides excellent flexibility in considering asymmetry and heavy tails in the data. This model can also be used for normally distributed data. The ECM algorithm is performed to obtain the ML estimates of parameters. This also helped to adapt the SMEM mentioned in this article with some tweaks. The simulation results show that STcT-MEM-based ML estimates performed well in moderate and large sample sizes. We also demonstrated our approach using real data set and showed that the STcT-MEM model performed better than the SMEM. The improved model is expected to give satisfactory results for analyzing method comparison data for moderate and large samples in the presence of measurement errors, skewness, and heavy tails, commonly found in many areas, especially in health-related fields. Our proposed methodology can be used only for unreplicated data. However, our methodology can be expanded to account for replicated measurements and multiple methods of measurement.

Appendix

A. Definition

A random vector is said to follow the distribution, that is, if its density function is where , be the density function of distribution, and denote the CDF of a standard normal distribution.

A random vector is said to follow the distribution, that is, if its pdf is

A random vector is said to follow the distribution, that is, if its pdf iswhere denotes the CDF of the univariate Student's distribution with degrees of freedom.

B. Hierarchical Representation for STcT-MEM

Consider as defined in (3) where .

A hierarchical representation for is defined as follows:where and defined for .and is a gamma distribution with parameters .

C. Linear Combination of Skew-Normals

Let, , , and .

If with at least one nonzero element, then

Data Availability

The dataset [25] used in the analysis is available from the author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this article.

Acknowledgments

The authors would also like to thank Professor L. C. Tomaya and Professor M. de Castro for providing the gold particle measurements dataset. This work was supported by the University of Peradeniya, Sri Lanka, under grant number URG/2022/59/S.