Assessing the Performance of Some Ranked Set Sampling Designs Using Hybrid Approach

2021-12-14 06:06MohamedSabryEhabAlmetwallyHishamAlmongyandGamalIbrahim

Computers Materials&Continua 2021年9期

Mohamed.A.H.Sabry,Ehab M.Almetwally,Hisham M.Almongy and Gamal M.Ibrahim

1Faculty of Graduate Studies for Statistical Research,Cairo University,Giza,12613,Egypt

2Faculty of Business Administration,Delta University of Science and Technology,Mansoura,35511,Egypt

3Department of Statistics,Delta University for Science and Technology,Mansoura,Egypt

4High Institute for Management Sciences,Belqas,35511,Egypt

Abstract:In this paper,a joint analysis consisting of goodness-of-fit tests and Markov chain Monte Carlo simulations are used to assess the performance of some ranked set sampling designs.The Markov chain Monte Carlo simulations are conducted when Bayesian methods with Jeffery’s priors of the unknown parameters of Weibull distribution are used,while the goodness of fit analysis is conducted when the likelihood estimators are used and the corresponding empirical distributions are obtained.The ranked set sampling designs considered in this research are the usual ranked set sampling,extreme ranked set sampling,median ranked set sampling,and neoteric ranked set sampling designs.An intensive Monte Carlo simulation study is conducted using Lindley’s approximation algorithm to compute the different designs’-based estimators.The study showed that the dependent design “neoteric ranked set sampling design”is superior to other ranked set designs and the total relative efficiency is higher than the other designs’total relative efficiency.

Keywords:Goodness of fit;ranked set sampling;Weibull distribution;Bayesian estimation;Lindley’s approximation;neoteric;ranked set sampling design

1 Introduction

Ranked set sampling (RSS) designs were first established in [1],to find a more efficient method to estimate the mean pasture yields.Since then,several modifications were considered to provide more efficient estimators and to reduce the errors in the ranking,see [2],and subsequently it will be possible to have better fits to the data under consideration.Extreme ranked set sampling(ERSS) design was introduced in [3],as the first modification of RSS,while [4]introduced another modification called median ranked set sampling (MRSS) design.The moving extreme ranked set sampling (MERSS) design was proposed in [5],while [6]introduced the double ranked set sampling (DRSS) design and proved that the population mean estimated using DRSS samples is more accurate and precise than those estimated with RSS and simple random sampling (SRS)designs.Later on,[7]suggested the multistage ranked set sampling (MSRSS) design as a generalization of the DRSS design.In [8]Zamanzade investigated a new ranked set sampling design with a dependence structure called neoteric ranked set sampling (NRSS) design and showed that NRSS based estimators are superior to the independent RSS based estimators.Moreover,twostage NRSS designs were proposed in [9],where they showed that five different sampling designs based on NRSS outperform RSS and NRSS designs.The likelihood estimation of distribution parameters using DRSS,NRSS,and DNRSS designs were proposed by [10,11],and showed that the proposed likelihood estimators provide similar results as when estimating population means and variances using these designs.

This paper aims to use goodness-of-fit (GOF) tests and indices together with Markov chain Monte Carlo (MCMC) simulations to assess the performance of four ranked set sampling designs,RSS,ERSS,MRSS,and NRSS designs.GOF analysis includes Kolmogorov-Sminarov test,the Akiki information criterion (AIC),the corrected Akiki criterion (CAIC),the Hanan Quatine information criterion (HQIC),and Schwarz Bayesian information criterion (BIC) indices.

Goodness-of-fit (GOF) tests are utilized in many areas of research where they are used to verify the distance between the theoretical distribution and the empirical distribution of a given set of data.These tests determine how well the distribution under study fits the data set in use.They can be applied to test the simple hypothesis which completely specifies the model,and composite hypotheses where only the name of the model/distribution is stated but not its parameters as the parameters are estimated from the data.When testing GOF using SRS samples,tests based on the empirical distribution function (EDF) are usually used.These tests include the Kolmogorov-Smirnov (KS) and Cramer-Von Mises (CVM) GOF tests discussed in [12]who gave a practical guide to GOF tests using statistics based on EDF.A comprehensive survey of GOF tests based on SRS can be found in [13],while when using RSS samples,these tests can be obtained simply by replacing the SRS EDF with the unbiased RSS EDF see [14].GOF indices such as AIC,CAIC,HQIC,and BIC are used for model selection and provide fair comparisons between different distribution candidates.

The rest of the paper is organized as follows:Section 2 is devoted to a simple introduction to the Weibull distribution,while Section 3 will introduce the four RSS designs used in the research.In Section 4,Bayesian analysis is considered for all designs including the SRS design,and in Section 5,the hybrid analysis and numerical study are investigated.Finally,the paper is concluded in Section 6.

2 The Weibull Distribution

The Weibull distribution,which is considered one of the widely used lifetime distributions in reliability engineering,was introduced in [15].It is a flexible distribution that can take on the characteristics of other types of distributions,based on the value of the shape parameter.The cdf,pdf,and the quantile functions of the Weibull distribution are given by respectively,wherex＞0,α＞0,β＞0 and 0＜u＜1.Fig.1 shows some pdf structures for the Weibull distribution at selected values of the scale and shape parameters.

Figure 1:Weibull probability density function for several shape parameter values

3 Different Ranked Set Sampling Designs

In this section,we will discuss the ranked set sampling designs considered in this research,and we will assume for simplicity purposes that the derivations and computations needed are made in one cycle(c=1).

3.1 RSS Design

The RSS algorithm according to [16]is described as (i) selectm2units randomly from the target population with cumulative distribution function (cdf)F(x;θ)and probability density function (pdf)f(x;θ).(ii) Allocate them2selected units as randomly as possible intomsets,each of sizem.(iii) Rank the units within each set without yet knowing any values for the variable of interest.The ranking can be based on personal or professional judgment or done on a concomitant variable correlated with a variable of interest.(iv) Choose a sample for actual quantification by including the smallest ranked unit in the first set,the second smallest ranked unit in the second set,the process continues in this way until the largest ranked unit is selected from the last set.(v) Repeat Steps (i) through (iv) forccycles to obtain a sample of sizen=mc.

3.2 ERSS Design

The first RSS modification proposed in [3]was used to estimate the population’s mean only using the maximum or minimum ranked units from each set.The process of selecting an ERSS sample is as follows:(a) Repeat steps (i) through (iii) in RSS design.(b) According to the set size,if it is even or odd,the selection method may be changed.If the set sizemis even,select the lowest-ranked unit of each set from the firstsets and select the largest ranked unit of each set from the othersets.If the set size is odd,select the lowest-ranked unit from the firstsets,the median unit of theset,and the largest ranked unit from the remainingsets.(c) Repeat the above stepsrtimes to obtain a sample of sizen=mr.

3.3 MRSS Design

It was introduced by [4]to estimate the population mean effectively.It was shown that the MRSS provides an efficient and unbiased mean estimator when the underlying distribution is symmetric.The scheme of MRSS is first as the usual RSS.The process is as follows,(a) repeat steps (i) through (iii) in RSS design.(b) If the set sizemis odd,select the median element of the set;otherwise,select theranked unit from the firstsets and the from the remainingsets select theranked unit.(c) Repeat the above stepsrtimes to obtain a sample of sizen=mr.

3.4 NRSS Design

The following process describes the NRSS design proposed by [8]:(a) Selectm2random units from the target population and rank them2sample units based on some preestablished ordering criterion.(b) Select the sample unit ranked in position [(i－1)m+l]th for the final sample fori=1,...,m,where ifmis odd,l=,and ifmis even,l=for oddiandl=m/2 for eveni.(c) Steps (a) and (b) can be repeatedrtimes to obtain a final sample of sizen=mr.

4 Bayesian Estimation

In this section,Bayes estimators of Weibull distribution parametersαandβare obtained under the assumption thatαandβare independent random variables distributed with Jeffery’s prior distributions as non-informative priors with densities given,respectively,by

and

It is to be noticed that in the current study,we will use the squared error loss function to derive the Bayesian estimators of bothαandβ.

4.1 Estimation Based on SRS Design

Assume that {xi,i=1,2,...,m}is a random sample (SRS) drawn from Weibull(α,β).The likelihood function for Weibull data is given by

The joint posterior distribution ofαandβis given as

Substituting Eqs.(4) and (5) into Eq.(6),the posterior distribution ofαandβbecomes

The Bayesian estimators ofαandβbased on the squared error loss function are,respectively,given by

and

4.2 Estimation Based on RSS Design

Let {x(i),i=1,2,...,m,wherex(i)≡x(ii)and－∞＜x(i)＜∞}be a ranked set sample drawn from a distribution with pdff(x;θ)and cdfF(x;θ),wheremis the set size andθis the parameter space.The likelihood function associated with this design is as:

The Likelihood function of RSS samples drawn from Weibull(α,β)is given by

After substituting Eqs.(4),(5) and (10) into Eq.(11),the posterior distribution ofαandβcan be derived directly as follows

The Bayes estimators ofαandβare the expected values based on their marginal posterior distributions and are,respectively,given by

4.3 Estimation Based on ERSS Design

Let {y(i),i=1,2,...,m} be a ranked set sample (RSS) drawn from a distribution with pdff(y;θ)and cdfF(y;θ),wheremis the set size andθis the parameter space.The likelihood function of the ERSS sample drawn from Weibull(α,β)is given by

Case I:modd

Case II:meven

By substituting Eqs.(4),(5),and (14) into Eq.(16) in case of odd set size and Eqs.(4),(5)and (15) into Eq.(17) in the case of even set size,the Bayesian estimators of bothαandβare directly derived as follows

and

respectively,in the case of odd set size,while in the case of even set size they are,respectively,given by

and

4.4 Estimation Based on MRSS Design

Case I:modd

Case II:meven

By substituting Eqs.(4),(5),and (22) into Eq.(24) for odd samples and Eqs.(4),(5) and(23) into Eq.(25) for even samples,the Bayesian estimators ofαandβare directly derived,respectively,as follows

and

in the case of odd set size,while in the case of even set size they are,respectively,given by

and

4.5 Estimation Based on NRSS Design

Let {u(k(i)),i=1,2,...,m}be a neoteric ranked set sample,wheremis the set size drawn from a distribution with pdff(u;θ)and cdfF(u;θ),wheremis the set size andθis the parameter space.Then,according to Sabry and Shabaan [11],the likelihood function of NRSS samples drawn from Weibull(α,β)is then given by

where

andk0=0,km+1=m2+1 andu(k0)=－∞,u(km+1)=∞.Therefore,the joint posterior distribution ofαandβis directly derived as follows

and therefore,substituting Eqs.(4),(5) and (30) into Eq.(31),the Bayesian estimators ofαandβare derived,respectively,as

and

As the Bayes estimators based on the above sampling designs involve complicated integral functions;Lindley’s approximation is considered to calculate the approximate Bayes estimators of α and β associated with each sampling design.

4.6 Lindley’s Approximation

Lindley [17]proposed an approximation procedure to evaluate the ratio of two integrals such that foru(φ,η);

wherel(φ,η;x)is the log-likelihood function of the parametersφandη.Several authors have used this approximation procedure to obtain the approximate Bayes estimators for various distributions,for example,[18-21].In the case of two-parameter distributions,using the notationu(θ)=u(θ1,θ2),the posterior mean can be approximated as follows

where

andτijis the(i,j)entry of the inverse of the observed information matrix.All quantities of unknown(θ1,θ2)in Eq.(34) are evaluated using the maximum likelihood estimators (MLEs)Assuming thatθ1=λ,θ2=β,andthe mean of the posterior distribution derived using different sampling designs can be obtained and thus Bayes estimators ofλandβare obtained for each sampling design.For application see [22-24].

5 Simulation Study

In this section,we conduct a Monte Carlo simulation to compare the performance of the different ranked set sample designs.The data were generated from Weibull (10,1.5),Weibull (10,3.5),and Weibull (10,20) distributions for different sample sizes (m=9,12,15,20,25,30 and 35).The simulation is conducted using software R Software.The algorithm is as follows:

a.Generatemrandom samples from the Weibull distribution using the quantile function defined in Eq.(3) with number of replicates nsim=10,000

b.Use the SRS design and different RSS designs discussed in Section 3 to simulate SRS samples and different RSS designs’samples.

c.Obtain the Bayesian estimators under squared error loss function and using Jeffery’s priors.

d.Calculate the root total mean squared error (RTMSE) for different RSS estimators and SRS estimators for each replicate,where

wherepis the number of parameters involved and calculate the total relative efficiency based on the sampling design A relative to the sampling design B (TRE(A,B)),which is defined as:

e.Conduct a GOF analysis and compare the empirical distribution for each replicate based on the likelihood estimators for all designs and compute the Kolmogorov-Smirnov (KS) statistic,Akaike information criterion (AIC),corrected Akaike information criterion (CAIC),Hannan-Quinn information criterion (HQIC) and Bayesian information criterion (BIC) for all fitted models.Compute an average KS statistic,AIC,CAIC,HQIC,and Schwarz-BIC indices.

The results of the simulation study are reported in Tabs.1-4.The results for TRE,TRMSE,and p-values for KS test analysis are demonstrated in Figs.2-4.From the results,the following comments are observed,

Table 1:Total relative efficiency and root total mean squared error for RSS-based estimators under perfect ranking and different designs

Table 2:GOF analysis for different sampling designs from Weibull (10,1.5)

· The total efficiency of all RSS-based designs increases as the sample size increases.

· It is clear that the NRSS design provides the most efficient estimators and is superior to other sampling designs.

· When the distribution shape is approximately symmetric,the RSS designs are more efficient than the corresponding efficiencies for asymmetric shapes.

Table 3:GOF analysis for different sampling designs from Weibull (10,3.5)

· Mean squared error decreases as the sample size increases and NRSS has the smallest MSE.· The GOF analysis showed that NRSS designs do have the highest p-value when testing the empirical distributions using KS test.Other GOF indices are the smallest for NRSS design relative to other RSS designs and they also decrease as the sample size increases.

Table 4:GOF analysis for different sampling designs from Weibull (10,20)

Figure 2:Total relative efficiency for different RSS sampling designs

Figure 3:Total root mean squared error for different RSS sampling designs

Figure 4:p-values for KS statistics for different RSS sampling designs

6 Conclusion

In this paper and based on numerical analysis,four RSS sampling designs were compared when estimating the parameters of the Weibull distribution.According to an extensive simulation study,it was possible to observe that under perfect ranking,the NRSS design outperforms the one-stage RSS,ERSS,and MRSS designs.Furthermore,it can be noted that the RTMSEs decrease as the set size increases,especially in asymmetric cases,and the total relative efficiency increases as the set size increases.Moreover,the NRSS design has the smallest MSEs and the largest efficiencies over the other sampling designs.

Acknowledgement:The authors are very grateful to the editor’s board and reviewers for their careful and fastidious perusing of the paper.The reviews are detailed and helpful to finalize the manuscript.The authors would like to kindly acknowledge them.

Funding Statement:The authors received no specific funding for this study.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

Computers Materials&Continua2021年9期

Computers Materials&Continua的其它文章: Distributed Trusted Computing for Blockchain-Based Crowdsourcing; An Optimal Big Data Analytics with Concept Drift Detection on High-Dimensional Streaming Data; Bayesian Analysis in Partially Accelerated Life Tests for Weighted Lomax Distribution; A Novel Deep Neural Network for Intracranial Haemorrhage Detection and Classification; Impact Assessment of COVID-19 Pandemic Through Machine Learning Models; Minimizing Warpage for Macro-Size Fused Deposition Modeling Parts