Meta-analytic structural equation modeling:a primer

2022-06-29 02:07:18ZhenWeiDai
Medical Data Mining 2022年2期

Zhen-Wei Dai

1Department of Epidemiology and Biostatistics,School of Population Medicine and Public Health,Chinese Academy of Medical Sciences and Peking Union Medical College,Beijing 100005,China.

Abstract Meta-analytic structural equation modeling is a new research field which can help to explain the relationship between a group of variables in several studies by testing structural equation modeling.However,this methodology is still not very popular in many countries and researchers due to lack of promotion.This report aims to give a primer on the principle of meta-analytic structural equation modeling and provide a reference for subsequent researchers to conduct meta-analytic structural equation modeling analysis.

Keywords:meta analysis;structural equation modeling;meta-analytic structural equation modeling;social science;psychology

Background

The integration of meta-analysis and structural equation modeling(SEM) is referred to as meta-analytic structural equation modeling(MASEM).MASEM is a new research field that can help explain the relationship between a group of variables in several studies by testing the model.Viswesvaran had published the articleTheory Testing:Combining Psychological Meta-analysis and Structural Equations Modeling,and put forward the method,process and criterion of integrating meta-analysis and SEM in 1995 [1].In 2015,Jak also published a monographMeta-analytic Structural Equation Modelling,to systematically expound on MASEM [2].Compared with meta-analysis and SEM,MASEM has incomparable advantages,and it has a wide range of applications in different fields,including pedagogy,management and medicine,etc.In the field of medicine,MASEM mainly focuses on issues about psychological and behavioral science.For example,by using MASEM,Cikrikci examined the mediation effect of anxiety between depression and fear of COVID-19 and stress during the COVID-19 pandemic by MASEM;Watson identified significant predictors of outcome following acquired brain injuries in adults;Dai&Ma built a model to predict the turnover intention of nurses in China[3–5].However,this methodology is still not very popular in many countries and researchers due to lack of promotion.Hence,this methodology has not yet formed systematic research system in many countries and regions at present.The present report aims to give a primer on the principle of MASEM and provide a reference for subsequent researchers to conduct MASEM analysis.

Introduction to meta-analysis

Meta-analysis refers to a research method that analyzes and evaluates a number of completed research results with related research problems by quantitative statistical methods through a systematic literature review,to obtain a research conclusion [6].The term meta-analysis was put forward by Glass,who divided research into three types:primary analysis,secondary analysis and meta-analysis[7].The data analyzed by the primary analysis is collected for the first time in research;for secondary analysis,the data has been analyzed by others;while meta-analysis integrates a number of independent studies and combines these independent studies by statistical means to get a comprehensive result.The earliest meta-analysis in the field of social science was completed by psychologists Smith and Glass in the USA.Due to controversy on the effect of psychotherapy,Smith and Glass synthesized 375 research results on whether psychotherapy was effective for patients and analyzed these literatures by statistical methods;at last,they put forward the conclusion that psychotherapy is effective for patients,and almost no difference existed in the effects of different psychotherapy methods,which has gained great credibility and influence at that time [8].Simultaneously,other researchers have developed similar methods to integrate research results,which are all called meta-analysis at present [9–11].Meta-analysis has been widely used in basic research,biology,psychology,economics,sociology,pedagogy,medicine and other fields,which shows the universality of meta-analysis in scientific research.

Introduction to SEM

SEM is a statistical method that integrates regression analysis,factor analysis and path analysis and is usually classified as the statistics of the second generation [12].Path Analysis,proposed by geneticist Wright in 1920,was primarily used to predict the heredity of guinea pig’s spotted pattern [13].This method has been widely used in many research fields in recent decades.Factor analysis appeared earlier and was put forward by Spearman in 1904,who applied this method to intelligence research to explain the correlation between different ability tests[14].This method is also one of the most widely employed statistical methods in the field of behavioral science.SEM is usually a confirmatory statistical method;that is,the SEM analyzed by researchers needs rigorous theoretical support and the SEM is evaluated by fitting the data to it [15].When the research data is multivariate normal,the hypothetical model can be fitted as long as the sample size and covariance Matrix which reflects the relationship between variables are provided.This is a great convenience of SEM analysis since it means when the author reports the correlation and standard deviation of variables in the article,other researchers can reproduce his analysis results without the original data.These conveniences also apply to path analysis and factor analysis [2].As mentioned before,SEM combines path analysis and factor analysis.Path analysis examines the complicated relationship between observed variables.However,if latent variables (also called“factors”)exist in the study,“turn latent into observed”would be needed.That is,researchers need to specify some indicators to measure latent variables first,then get the observed values of latent variables through the observed values of indicators and finally analyze the path relationship between latent variables.Path analysis is to directly take the arithmetic mean value of each measurement indicator of the latent variables to represent the latent variables,so path analysis can be regarded as a special SEM analysis.Compared with path analysis,SEM can simultaneously estimate the measurement indicators,latent variables,error of measurement indicators,and validity of measurement.Therefore,SEM analysis is more rigorous than traditional path analysis[15,16].

Why combining meta-analysis and SEM

Most studies are about the relationship or difference between a set of variables.In this report,we will take the Information-Motivation-Behavior skill model (IMB) as an example[17,18].Four effects are assumed in this model:motivation and information are associated,motivation can affect behavior skills,information can affect behavior skills and behavior skills can affect behavior.When conducting a meta-analysis about this model,only one single effect (such as association between motivation and information) in the model can be integrated for analysis each time at present.Therefore,researchers cannot examine the effects of multi-variables through meta-analysis,and cannot analyze the mediation effects such as behavioral skills between motivation and behavior [2].However,the advantages of a meta-analysis are that,on the one hand,if the sample size of a study were too small,the statistical power of the study would be too low to reject the wrong null hypothesis,but if several studies with a small sample size that investigate the same variable relationship were integrated by meta-analysis,the results with a large sample size could be obtained and the statistical power is higher;On the other hand,the research time of literature included in the meta-analysis is often different,which provides a reliable result across time.

SEM analysis generally requires a large sample size;otherwise the wrong model may not be identified because of the low statistical power [19].Additionally,for the same set of variables,different researchers may propose various models to support their own data and comparing and integrating these models is an important issue.While researchers found the established model is consistent with their data,they rarely consider using other alternative models to compare with the established model [20].This kind of confirmation bias could seriously hinder the progress of related scientific research [21].Furthermore,the data used in SEM analysis is usually cross-sectional,indicating that SEM analysis lacks the mechanism of repeated verification and cross-time reliability and validity.However,the advantage of SEM is that it can deal with complicated relationships among multiple variables simultaneously.The advantages and disadvantages of meta-analysis and SEM are shown in Table 1.

Table 1 Comparison of meta-analysis and structural equation modeling

As mentioned above,meta-analysis and SEM are actually two complementary statistical methods.By using MASEM,researchers can summarize information from multiple studies to analyze a single model containing the relationship between multiple variables or compare several models supported by different studies or theories to get the best model[1,22].It should be noted that MASEM can be used to solve the research problems not covered in any original research,that is,it is possible for a model proposed in MASEM to contain variables that none of the primary studies included all in their study[2].For example,the first study reported the correlation coefficient of variables A and B,the second study reported the correlation coefficient of variables B and C,and the third study reported the correlation coefficient of variables A and C.Although none of these three studies included all variables A,B and C,the relationship among these three variables can be estimated by MASEM.

Approaches to MASEM

If an article reports the correlation between variables of interest to researchers or information that can be used to estimate the correlation,this information can be used in meta-analysis.An SEM can be analyzed by covariance matrix or correlation matrix between input variables without original data.MASEM is the process of integrating the correlations among studies through meta-analysis to obtain the pooled correlation matrix between variables,and then using this matrix to fit the SEM.Many studies use different scales to measure the same variable;hence the measurement scale of the same variable may be different.Therefore,MASEM usually uses pooled correlation matrix instead of a covariance matrix for analysis [2].MASEM analysis generally includes two main steps.The first step is to test the heterogeneity of the correlation coefficients extracted from the literature and form a pooled correlation matrix.The second step is to fit the model by using the pooled correlation matrix.Two main methods to conduct MASEM analysis are a univariate method and multivariate method,and the multivariate method mainly includes generalized least squares method (GLS method) and two-stage structural equation modeling (TSSEM).The approaches to conducting MASEM analysis are illustrated in Figure 1.

Figure 1 Approaches to meta-analytic structural equation modeling. GLS,generalized least squares;TSSEM,two-stage structural equation modeling;ML MASEM,maximum likelihood meta-analytic structural equation modeling.

Univariate method

Introduction.The univariate method includes univariate-Z method proposed by Hedges and Olkin and univariate-R method proposed by Hunter and Schmidt [23,24].Generally,little difference exists between the estimation results of the two methods [25,26].We will take univariate-Z method for example.The first step of the univariate method is to conduct a meta-analysis for the correlation between each pair of variables.Assuming that five variables are proposed in the SEM constructed by the researcher,then 5 × 4/2=10 bivariate correlation coefficients will be needed.The researchers need to conduct a meta-analysis for each correlation of a pair of variables and 10 times in total,and the concrete mechanism is an R-Fisher’s Z-R transformation,which is displayed in Figure 2.The second step is to fit the SEM with the matrix,and get the relationship among the variables.The univariate method is the most popular MASEM analysis method at present,and 95% of published MASEM papers worldwide use this method [27,28].

Figure 2 R-Fisher’s Z-R transformation

Steps of MASEM analysis by univariate method.According to the suggestions of research and characteristics of MASEM analysis,this report proposed 4 steps to conduct MASEM analysis by univariate method.1.Determine the research variables and their operational definitions.The researchers should first determine what variables to and the hypothetical relationship between them.This step is similar to the traditional theoretical verification research.After the variables are selected,researchers need to determine the operational definition of each variable.When integrating the results of different studies,researchers should pay attention to distinguish the concepts and operational definitions of different variables according to the theory.2.Literature retrieval and quality evaluation.Each literature should at least include a correlation coefficient of one pair of variables,and each correlation coefficient of a pair of variables should be found in at least two literatures.In the aspect of literature retrieval,researchers should develop the search queries according to the research questions.It is better to develop the search queries under information retrieval experts’ guidance or participation to find literatures more comprehensively.After the literature retrieval,researchers need to screen the collected literature according to the inclusion and exclusion criteria of the study,and select qualified studies for meta-analysis.Generally,at least two researchers are required to screen and cross-check the literatures,and differences should be discussed or arbitrated by the research leader.3.Integrating correlation matrix by meta-analysis.After literature screening,each correlation should be tested for publication bias,heterogeneity and sensitivity analysis.The correlation coefficients should be integrated with fixed effect model if they are homogenous,and with random effect model if they are heterogeneous.Finally,the correlation coefficients can be integrated as a pooled correlation matrix after calculation.When missing value exists in the correlation matrix,that is,the researcher cannot find the correlation between a pair of variables through literature search,the missing values need to be interpolated.Five common interpolation methods are:(1) conducting original investigation on two variables with missing correlation coefficients;(2) using the arithmetic mean of all the existing correlation coefficients in the correlation matrix to replace the missing values;(3) deleting the two related variables,and only retaining the variables that have literatures reporting the correlation coefficients for the next analysis;(4) consulting authoritative experts in related fields to estimate the missing correlation [11];(5) if a regression coefficient β between two variables with missing correlation is reported in any literature,the correlation coefficient can be interpolated by the following formula:r=0.98β+0.05λ,where λ equals 1 when is β non-negative and 0 when β is negative [29].4.SEM analysis.Generally,each latent variable in SEM needs at least three measurement indicators [30].However,in MASEM,there is usually only one measurement indicator for each latent variable,so before using polled correlation matrix to fit SEM,researchers need to set the single measurement indicator of latent variables for the constructed SEM [31].This setting requires researchers to record the reliability coefficient values of each variable in the literature,such as Cronbach’s α.The reliability of each latent variable can be evaluated by the arithmetic mean of all the reliability values of the variable included in the meta-analysis.If the reliability of a latent variable is not reported in any literature,it could be set to 0.8 according to the conservative principle [32].After the reliability of each latent variable is calculated,the model can be set according to the calculated reliability [31].To be specific,in each latent variable,(1) setting the unstandardized factor loading of the single indicator as(2) setting residual variance of the single indicator as 1–α.For example,if a latent variable’s reliability is 0.84,then the unstandardized factor loading and residual variance should be set as 0.9165 and 0.16,respectively.After the model is set,it is necessary to determine the sample size of the input model.Since the polled correlation matrix is used for analysis,it is also necessary to specify the arithmetic mean and standard deviation of each variable.Generally,in the pooled correlation matrix,each variable is standardized with the mean set to 0 and the standard deviation set to 1.The model sample size is usually the harmonic mean of the total sample size of each correlation coefficient in the pooled matrix because the harmonic mean can adjust the influence of excessive sample size[1].The maximum likelihood(ML)method or weight least squares (WLS) method can be used to fit the model with the pooled correlation matrix [1,33].After the model is fitted,the model fit indices,path coefficients or correlation coefficients can be examined,and the mediation effect tests or invariance results can be reported based on the research hypotheses.

Disadvantages of univariate methods.Although the univariate method is relatively simple and easy to understand,many researchers have pointed out that it actually has many problems [34].First,the univariate method assumes that each correlation is independent of each other,and does not consider the relationship between each correlation coefficients,which may lead to wrong judgment of the heterogeneity of correlation coefficients.Secondly,the univariate method directly uses the correlation matrix integrated by meta-analysis to fit SEM without any adjustment,but actually it is more recommended to use the covariance matrix instead of the correlation matrix for analysis since the diagonal of the correlation matrix is always 1 instead of variable variance in covariance matrix,suggesting the covariance matrix carries more information than the correlation matrix [35].Therefore,directly using the correlation matrix without any adjustment instead of the covariance matrix may lead to biased estimation of the model chi-square value,model fit indices,and parameter standard error [36].Thirdly,the model sample size plays an important role in the estimation of the model,but there is still controversy about the determination of the model sample size in the univariate method.Different researchers suggest different standards of sample size choice,including the arithmetic mean,harmonic mean,median,sample sum,etc.[1,37–40].Some model fit indices and standard errors of parameter estimation are sensitive to sample size,so using different sample sizes may lead to different results and conclusions.However,harmonic mean might be a better choice because it can adjust the influence of excessive sample size[1].

Multivariate method

Different from the univariate method,the multivariate method takes the correlation between the correlation coefficients into account.The most commonly used multivariate methods in MASEM include GLS methodand TSSEM,etc.[22,34,41].In fact,the analysis steps of multivariate method are exactly the same as those of univariate method in“Steps of MASEM analysis by univariate method”,but there are some differences in the integration of correlation matrix and SEM analysis,i.e.,steps 3–4 in univariate method described in“Steps of MASEM analysis by univariate method”.

GLS method.GLS method first integrates the correlation matrix through multivariate meta-analysis.Unlike univariate method,GLS method considers the variance and covariance between correlation coefficients simultaneously [2].We useRito represent the sample correlation matrix for each study i,andρFto represent the common population correlation matrix.Generally,they need to be transformed into vectors of correlation coefficients.riis called the sample correlation vector andPFis called common population correlation vector.Because the diagonal lines of the correlation matrix are all 1 and the data of the upper and lower triangles are the same,rionly keeps the data of the lower triangle of the correlation matrix [42].For example,a 5×5 correlation matrix will only retain the 10 correlation coefficients of the lower triangle.The equation of multivariate meta-analysis meta random effects is as follows:

WhereXiis a selection matrix containing only 0 and 1 to filter the missing correlation coefficients in each study,uiis a random effect vector,andeiis an error variance vector.The heterogeneity matrixTρ2=Var(ui) can evaluate the heterogeneity of correlation coefficient,and the variance-covariance matrixVi=Var(ei) has generally been calculated before the analysis.The variance-covariance matrixViis used to measure the sampling error,which decreases with the increase of sample size.However,the heterogeneity matrixTρ2 represents the true variation among the correlation coefficients of various studies at the overall level.It should be noted that the dimensions inViof each study included in the meta-analysis may be different.For example,if a researcher needs to study the relationship of five variables,but a study only contains three variables,then theViof that study has only three rows and three columns.The block diagonal matrix composed of all the studiedViis calledVand the diagonal isVi:

Vis a symmetric matrix,and the number of rows and columns is equal to the total number of correlation coefficients observed in all studies.

For the convenience of demonstration,we assume that there is no heterogeneity among the included k studies,that is,the fixed effect model is adopted.Then,after pooling the matrices,we can get the following equation:

We can obtain an average correlation vector ρ^Fand an asymptotic covariance matrix V^Fby calculation.QGLSstatistic is used to test the heterogeneity of all correlation matrices among the k studies.

Becker proposed that the regression model could be fitted with correlation matrix R.The covariance matrix (COV(β^)) of the standardized regression coefficient of the sample can be obtained by a series of complex operations such as decomposition ofρF.We can use this matrix to make statistical inference and interval estimation of the standardized regression coefficient [22].

A limitation of GLS method is that it can only fit the traditional regression model.If path analysis and factor analysis are to be carried out,we must input the correlation coefficient matrix integrated by GLS method into SEM software for analysis.Additionally,this method,like univariate method,cannot provide an accurate sample size to the model,and it does not take into account the variation of inter-study sampling [2,43].However,some researchers have pointed out that the traditional SEM software can be used to fit ρ^Fby using WLS method V^Fas the weighting matrix [34].This method is similar to the TSSEM introduced next and corrects some limitations of the traditional GLS method.However,it should be mentioned that GLS method can only fit regression models and some path models or factor models,but can't analyze SEM with latent variables,so GLS method is not commonly used in MASEM.

TSSEM method.TSSEM was proposed by Cheung and Chan in 2005[34].Although the GLS method and univariate method mentioned above are also divided into two stages in the analysis part:integration of correlation matrix and model fitting.However,both GLS method and univariate method integrate correlation matrix with traditional meta-analysis first,and then analyze it with SEM,while TSSEM method uses SEM in both stages,so it is called“two-stage structural equation model”.The following will introduce the two steps of TSSEM,and we still take the fixed effect model as an example.

The first step of TSSEM method is to integrate correlation matrix and heterogeneity test,which is similar to GLS method,but the estimation method is ML instead of GLS method.The estimation of ML method is more accurate and can handle missing values,and whether missing values are completely random,random or even non-random,ML method can obtain relatively accurate estimation compared with other methods [44,45].

We useRito represent the sample correlation matrixpi×piof the study i,andpirepresents the number of variables of the study i.Each study does not need to contain all the research variables.Similar to GLS method,TSSEM method also uses selection matrix to deal with missing values,but the selection matrix of TSSEM method filters missing variables instead of missing correlation in GLS method.In addition,in the TSSEM method,the selection matrix will not be stacked into a large matrix,but the identity matrix excluding the missing variable rows will be used.

Next,multi-group structural equation modeling is used to estimate the population correlation matrix R ofpvariables,and the estimation method is ML.Each study is regarded as a group,and the corresponding models of each group are as follows:

In formula 5.1,R represents the population correlation matrix ofp×p,and its diagonal is fixed at 1;Xirepresents the selection matrix ofpi×p,wherepi

The fit indices of the model in formula 5.1 can be obtained by comparing the chi-square difference between formula 5.1 and the saturated model.The saturated model relaxes the restriction that all correlation coefficients are equal in different studies,that is,each study estimates a separate Ri,thenXiwill no longer be needed.At this time,the model equation is:

The difference between discrepancy function FML in Formula 5.2 and Formula 5.1 obtained by ML method multiplied by the difference between the total sample size and the number of studies,has the chi-square distribution with the degree of freedom being the difference of the number of free parameters.If the result of chi-square likelihood ratio test is statistically significant (P<0.05),the null hypothesis of homogeneity among studies will be rejected,that is,heterogeneity among studies is considered;Otherwise,it is not considered heterogeneous.

The second step of TSSEM is to fit the structural equation model.Cheung and Chan suggested that WLS should be used to fit the model in the second step,and the matrix used for fitting is the pooled correlation matrixRestimated in the first step [34].The first step of TSSEM can obtain the population correlation coefficient matrixRand the asymptotic variance covariance matrixV.The second step is to make the model fit by the matrixrby minimizing the weighted least squares fit function [46]:

Whererrepresents the column vector of the unique element inR,rMODELrepresents the column vector of the unique element in the model implied correlation matrixRMODEL,andV–1represents the inversed matrix of matrixV,which is used as the weight matrix.Researchers generally believe that using this method to estimate the model can get the correct model parameter estimation and standard error [34].

ML method.Oort and Jak put forward another parameter estimation strategy of TSSEM.The two steps of this method are estimated by ML method which is called Maximum Likelihood Meta-Analytic Structural Equation Modeling (ML MASEM) [33].Its first stage is the same as TSSEM,which is to use ML method to test heterogeneity and form pooled correlation matrix.The difference lies in the model fitting of the second stage.In ML MASEM,a commonRMODELis fitted into the observation matrix of all studies,andRMODELcan have any SEM structure.For example,if there is a factor model in the second stage,then the model of each study i is:

WhereXirepresents the selection matrix andDirepresents the diagonal matrix.BecauseRMODELis nested in the population correlation matrixR,the chi-square difference between them has the chi-square distribution with the degree of freedom as the free parameter difference.The results of ML analysis in the second stage of MASEM analysis are very similar to those of WLS analysis [33].

The advantage of ML MASEM is that both phases of MASEM are analyzed by ML method so that the models of the first phase and the second phase form a nested structure.This method can also be better used to restrict different studies in SEM by applying equations[2].The disadvantage of ML MASEM is that it can only be applied to fixed effect model at present.

Conclusion

This report gives a primer on MASEM analysis.Researchers can employ MASEM analysis in applicable fields based on the description and principles mentioned in this report.