L-Moments Based Calibrated Variance Estimators Using Double Stratified Sampling

2021-12-14 06:05UsmanShahzadIshfaqAhmadIbrahimMufrahAlmanjahieandNadiaAlNoor

Computers Materials&Continua 2021年9期

Usman Shahzad,Ishfaq Ahmad,Ibrahim Mufrah Almanjahie and Nadia H.Al-Noor

1Department of Mathematics and Statistics,International Islamic University,Islamabad,44000,Pakistan

2Department of Mathematics and Statistics,PMAS-Arid Agriculture University,Rawalpindi,46300,Pakistan

3Department of Mathematics,College of Science,King Khalid University,Abha,62529,Saudi Arabia

4Statistical Research and Studies Support Unit,King Khalid University,Abha,62529,Saudi Arabia

5Department of Mathematics,College of Science,Mustansiriyah University,Baghdad,10011,Iraq

Abstract:Variance is one of the most vital measures of dispersion widely employed in practical aspects.A commonly used approach for variance estimation is the traditional method of moments that is strongly influenced by the presence of extreme values,and thus its results cannot be relied on.Finding momentum from Koyuncu’s recent work,the present paper focuses first on proposing two classes of variance estimators based on linear moments(L-moments),and then employing them with auxiliary data under double stratified sampling to introduce a new class of calibration variance estimators using important properties of L-moments(L-location,L-cv,L-variance).Three populations are taken into account to assess the efficiency of the new estimators.The first and second populations are concerned with artificial data,and the third populations is concerned with real data.The percentage relative efficiency of the proposed estimators over existing ones is evaluated.In the presence of extreme values,our findings depict the superiority and high efficiency of the proposed classes over traditional classes.Hence,when auxiliary data is available along with extreme values,the proposed classes of estimators may be implemented in an extensive variety of sampling surveys.

Keywords:Variance estimation;L-moments;calibration approach;double sampling;stratified random sampling

1 Introduction

Planning is an integral part of the administrative process for the development of any field.Among the most important outputs of the planning process are the plans and programs that institutions seek to execute.One of the most important pillars of planning success is the availability of data and information that enables the decision-maker to conduct scientific analysis.In statistical literature,the additional information attached to each element is referred to as auxiliary(or ancillary,supplementary,supporting,concomitant) information.Whatever type of information is offered,it can be used to identify better sampling strategies.Auxiliary information has been used with sampling techniques for many years.The authors of [1,2]were pioneers in the usage of auxiliary information regarding the development of estimation techniques with high estimation accuracy.Recently,there have been many interesting works using auxiliary information in different ways [3-12].

In all sample surveys,the major concern is the derivation of point estimators for various parameters of interest.Nevertheless,it is equally important to evaluate the performance of these estimators.The importance of variance estimators lies primarily in the fact that the estimated variance,of any estimator,is a major component of its quality.Reference [13]pointed out that the importance of variance estimation lies in the fact that it offers an indicator of the quality of estimators.It can be used in calculating confidence intervals,and drawing accurate conclusions,and can provide indicators of data quality.The sampling design that underlies a sample survey is one of the most important factors determining both the size of sample and the procedure needed to estimate the variances.More specifically,there are many components of sample designs related to the estimation of variances,including the number of sampling stages.In single or one-stage sample designs,the stage is very direct,and the closed formula can be derived for estimation of variance.In designs with more than one stage,the state becomes complicated since there is more than one source of variance.At each stage,unit sampling (primary,secondary,etc.)leads to an additional component of variance.In cases where all other components of sampling and estimation are rather simple,a closed formula can be obtained by calculating the variance at each stage.However,common practice is to roughly estimate the variance by estimating the variation among the initial sampling units,since this is the dominant component of the overall variance.For example,with double or two-stage sampling,there are two sources of variance such as variation resulting from the selection of the primary sampling units and the variation resulting from the selection of the secondary sampling units (for more details,see [14]).There are also many studies that have employed double sampling for real data [15-18].In this paper,we consider double stratified random sampling.With stratified sampling,the population is split into subpopulations that are not overlapping;these are known as strata and typically describe homogeneous subpopulations,resulting in reduced overall variability.A random sample is chosen from each stratum,independently of the other stratum.A stratified sampling pattern may be the same or different from that of other stratum.

Consider X and Y as the auxiliary and study variables associated with a finite population of size N,andΩ={ν1,ν2,...,νn},whereΩis stratified intoRstrata with thehth stratum includingNhunits.h=1,2,...,R,andFor the first stage,a simple random sample with sizen*his chosen from the stratum h without replacement such asThen the samplenhfor the second stage is selected.h=1,2,...,R,(xhi,yhi)represents the observed values ofXandYwithi=1,2,...,Nh,andandrepresent the variances ofXandYfor the first and second stage samples,respectively.In view of this double stratified sampling design,the traditional variance estimator is

It is worth noting thatis based on traditional moments and hence is highly affected by the presence of extreme values.Note also thatis the stratum’s weight.

The analysis of sample data is complex.The complexity of the analysis increases when the data contains unusual points (outliers or extreme values) that affect the robustness of the variance estimation under traditional central moments.One of the solutions to tackle this issue is to use L-moments instead of traditional central moments.L-moments provide a robust statistical framework for the analysis.L-moments [19]are determined by linear combinations of the expected values of the order statistics (O.S.).Furthermore,calibration estimation is another common statistical approach that relies on the use of auxiliary information to adjust the original weights of the design and improve the accuracy of estimators.The authors of [20]were pioneers in the use of calibration estimation with survey data and several additional works on mean estimation have been published since (for example,see [21-23]).

In the present paper,our objective is to develop some new classes of variance estimators for a variable of interest,based on L-moments and the calibration approach under double stratified random sampling.The remainder of this article is organized as follows.In Section 2,the L-moments and proposed classes are presented in detail.Numerical illustrations of three populations are offered in Section 3 to evaluate the performance of the new estimators.Finally,Section 4 provides conclusions.

2 L-Moments and Proposed Classes

Reference [19]described the L-moments as expectations of the order statistics of certain linear combinations.L-moments can be specified for any random variable for which a mean exists.They are used to describe probability distributions and estimate parameters,and their estimates are used for summarizing and describing the samples of observed data.There are many advantages of L-moments over traditional moments:they are linear data functions,they suffer less from the effects of sample change-ability,they are more robust to outliers/extreme values in data,and they enable safer inferences made from small samples about any fundamental population parameter.The general population mathematical forms of first four L-moments for the auxiliary variableXin relation to the stratumhare defined as follows:

Similarly,we can write second-stage sample L-moments of the auxiliary variable as

wherexh(d)represents thedth order statistics with binomial coefficient (:).Similarly,we can write the L-moments expression for the first-stage sample asFurthermore,we can write the mathematical expressions of L-moments for the study variableYby adapting the structure of auxiliary variable X.

2.1 First Proposed Class of Estimators

The authors of [9,10]used robust regression and robust co-variance matrices methodologies for improved estimation of the population’s mean.Their use of robust regression and robust co-variance matrices allows us to utilize robust moments (L-moments) instead of traditional moments.Hence,taking motivation from [21],we propose the following class of L-moments based calibration estimators of variance under double stratified sampling:

where the calibrated weights are selected to minimize the measure of chi-square distance

is subject to the following calibration constraints

The Lagrange function is given as

whereμ11andμ12are the Lagrange multiples.To obtain the optimum value for the calibration weight,we differentiate the Lagrange function with respect toγhand set it equal to zero.Thus the weight of calibration can be obtained in the form

Now,μ11andμ12can be obtained by replacingγhin Eqs.(4) and (5) with its value given by Eq.(7).Thus,we obtain a weight of calibration of

By substituting the value ofγhfrom Eq.(8) with that from Eq.(2),we can obtain the proposed estimator of the calibration as follows:

The members of the first proposed class are provided in Tab.1.

Table 1:First proposed class of estimators

2.2 Second Proposed Class of Estimators

By extending the idea ofVai,we propose the second class of estimators of variance under double stratified sampling as given below:

Through using the distance of chi-square,

which is subject to the following three calibration constraints:

The Lagrange function is given as

After taking the derivative ofTwith respect toγand setting it equal to zero,we get

The following equations system can be obtained by substituting Eq.(16) into Eqs.(13)-(15)respectively:

[Pa]3×3[Pb]3×1=[Pc]3×1

where

Upon solving the equations system for μs,we get

where

When substituting these μs,into Eq.(16) and then Eq.(11),we obtain the following:

The members of the second proposed class are listed in Tab.2.

Table 2:Second proposed class of estimators

3 Numerical Illustrations

Here,we evaluate the performance of the proposed estimators through three populations.

3.1 Simulation Design(Population-1)

In this article,we consider the population with size N=1000.Utilizing an equal allocation of a sample with size 100 is selected from hth stratum,and the total sample size nh=400.Furthermore,for stratum h,random variables Xhand Yhare defined as follows:

where Xhforh=1,2,3,4 follows Gamma distributions with parameter values as given below:

X1～Gh(2.6,3.8),for h=1

X2～Gh(2.0,3.1),for h=2

X3～Gh(1.5,2.7),for h=3

X4～Gh(2.9,3.1),for h=4.

ε follows a standard normal distribution,andδ=5,p=1.6,andK=2.

Figs.1-4 show the scatter plots for each stratum.The existence of extreme values is clearly demonstrated by these figures and are therefore fitting for evaluating our proposed estimators.

Figure 1:Population-1,h=1

The simulation steps are as below:

Step 1:Select a random sample with sizenhthrough SRSWOR from stratum h.

Step 2:Find the value of variance estimate,sayω=Vai,Vbiwhereai=1,2,...,15 andbi=1,2,...,10 .

Step 3:Repeat Steps 1 and 2 forL=5000 times.Obtainω1,ω2,...,ωL.

Step 4:Compute the mean square error (MSE) as

Step 5:Compute the percentage relative efficiency (PRE) as

Figure 2:Population-1,h=2

Figure 3:Population-1,h=3

The estimators’PRE obtained from the above five steps are provided in Tab.3.

Figure 4:Population-1,h=4

Table 3:PRE for Population-1

3.2 Real Life Data

The apple fruit is one of the most common types of fruits.It is native to Central Asia,but today it grows worldwide with different colors and sizes.The apple fruit is rich in fiber,vitamins,and antioxidants and has many health benefits.

In the present article,we use collected apple fruit data used by [24],where

Population-2:X=number of apple trees in 1999,Y=level of apple production in 1999.

Population-3:X=level of apple production in 1998,Y=level of apple production in 1999.

It should be noted that we consider 477 villages in four strata in 1999,termed (1:Marmarian),(2:Agean),(3:Mediterranean),and (4:Central Anatolia).The scatter plots of extreme values for each stratum are shown in Figs.5-12.The estimators’PREs are computed as defined in Subsection 3.1,and are presented in Tabs.4 and 5.The first-stage samples with sizes n*1,n*2,n*3and n*4are selected,and then from these samples the second-stage samples with sizes n1,n2,n3,and n4are selected:

N1=106,N2=106,N3=94,N4=171,

n*1=58,n*2=58,n*3=52,n*4=94,

n1=29,n2=29,n3=26,n4=47.

Figure 5:Population-2,h=1

Figure 6:Population-2,h=2

Figure 7:Population-2,h=3

Figure 8:Population-2,h=4

3.3 Findings

1:From Tab.3,we can see that the results(Vai,Vbi)of Population-1 indicates that

PRE(Va11－15)＞PRE(Va1－5)＞PRE(Va6－10),w.r.t.Vai

PREw.r.t.Vbi.

The proposed estimatorsVa11andVb3record the highest efficiency compared to other competitor estimators.

Figure 9:Population-3,h=1

Figure 10:Population-3,h=2

2:Meanwhile,the results(Vai,Vbi)of Population-2 in Tab.4 indicate that

PRE(Va11－13,Va15)＞PRE(Va1－5,Va14)＞PRE(Va6－10),w.r.t.Vai

PREw.r.t.Vbi.

The proposed estimatorsVa11andVb5record the highest efficiency compared to other competitor estimators.

Figure 11:Population-3,h=3

Figure 12:Population-3,h=4

3:The results(Vai,Vbi)of Population-3 (see Tab.5) reveal that

PRE(Va11－15)＞PRE(Va1－5)＞PRE(Va6－10),w.r.t.Vai

PREw.r.t.Vbi.

Hence,the proposed estimatorsVa11andVb1record the highest efficiency of all compared estimators.

4:Comparing the two proposed classes for each population,leads us to the following findings:

5:Overall,all the members of new classes havePRE＞100 with respect toTo,and this clearly indicates that the performance of the proposed estimators is better than that of traditional estimators.

6:Furthermore,the proposed variance estimatorVa11is the best estimator among all proposed estimators,having PREs of 478.67,28051.41,and 77307.88 for populations 1-3,respectively.

Table 4:PRE for Population-2

Table 5:PRE for Population-3

4 Conclusion

The difficulty of data analysis arises from the presence of extreme values that adversely impact the variance estimation based on central moments.One of the ways to solve this issue is to use L-moments that provide a robust statistical structure for analysis.Calibration estimation is a common statistical approach that relies on the use of auxiliary information to adjust the original weights of design and to improve the accuracy of estimators.Motivation by [21],we propose new classes of estimators to estimate the population variance based on L-moments and present a calibration approach for double stratified random sampling.The percentage relative efficiency is adopted to compare the performance of the proposed estimators through three populations and through a simulation as well as application to real-life data.Our numerical results show that the proposed estimators are always superior and more efficient to existing estimators.

Funding Statement:The authors thank the Deanship of Scientific Research at King Khalid University,Kingdom of Saudi Arabia for funding this study through the research groups program under Project Number R.G.P.1/64/42.Ishfaq Ahmad and Ibrahim Mufrah Almanjahie received the grant.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

Computers Materials&Continua2021年9期

Computers Materials&Continua的其它文章: Distributed Trusted Computing for Blockchain-Based Crowdsourcing; An Optimal Big Data Analytics with Concept Drift Detection on High-Dimensional Streaming Data; Bayesian Analysis in Partially Accelerated Life Tests for Weighted Lomax Distribution; A Novel Deep Neural Network for Intracranial Haemorrhage Detection and Classification; Impact Assessment of COVID-19 Pandemic Through Machine Learning Models; Minimizing Warpage for Macro-Size Fused Deposition Modeling Parts