Nonparametric Estimation Method for the Distribution Function Using Various Types of Ranked Set Sampling

Ghareeb, Ramy Saad; AL Khalidi, Rikan Abd AL Azeez

doi:10.33899/iqjoss.2025.187752

	Nonparametric Estimation Method for the Distribution Function Using Various Types of Ranked Set Sampling
IRAQI JOURNAL OF STATISTICAL SCIENCES
Volume 22, Issue 1, May 2025, Pages 19-38 PDF (595.76 K)
Document Type: Research Paper
DOI: 10.33899/iqjoss.2025.187752
Authors
Ramy Saad Ghareeb^* ¹; Rikan Abd AL Azeez AL khalidi²
¹Postgraduate Student Department of Statistics and Informatics, Mosul University, Iraq
²Mosul University / College of Computer Science and Mathematics / Department of Statistics and Informatics Science
Abstract
The purpose of this research is to estimate the cumulative distribution function using the local polynomial regression and compare it to parameter estimation using the method of moments and the maximum likelihood method to calculate both the mean square error and the bias using the ranked sets sample and the median ranked sets sample . As well as frequently produces more exact estimates than simple random sampling for the same sample size. By ranking samples based on some easily measurable characteristic, the variability within each set is decreased, resulting in more accurate estimations. We investigated three different degrees of local polynomial regression: the first, second, and third. The simulation analysis demonstrated that the second degree outperforms the other degrees. Also, when is used to analyze data, it takes advantage of the reduced variability within each ranked set, resulting in more precise and reliable regression function estimates. Following that, we investigated several degrees of bandwidth (0.1, 0.2, … and 0.9) and discovered that the bandwidth of degree 0.8 is superior to the other degrees based on a simulation study. Finally, we analyzed the relative efficiency of each of the three approaches: , , and , and we discovered that is more efficient than the other methods for estimating the in different kernels (normal (gaussian), epanechinkov). The numerical results provide that the suggested estimator based on is more efficient than other methods, as predicted by the simulation analysis
Highlights
This article is concerned with estimating the cumulative distribution function based on the local polynomial regression depending on and . A new estimator depends on is derived. The resulting proposed estimator is used to introduce three ways of estimating (the method of moments, the maximum likelihood method, and ), based on and . The method of moments and the maximum likelihood method based on were suggested by Al-Saleh and Ahmad (2019). In this study we get the same result of estimate for the method of moments and the maximum likelihood method, estimator based on can have some advantages over their competitors for fixed or non-fixed samples; on the empirical side, we use kernel (normal, epanechikov) with bandwidth (0.1, 0.2, …, 0.9) and three levels of degree of kernel, We have concluded that kernel epanechikov is a little better than normal, with bandwidth 0.8 giving the best result. Compared to the others, bandwidth and degree 2 are also like that. We have concluded that of based on is better than of based on because the data in is stable and is less prone to ranking errors for fixed or non-fixed samples. Depending on the relative efficiency, we get that there is no big difference between both kernels, nonetheless we conclude that relative efficiency depends on whether the is better than the and has more efficiency. We recommend using based on estimators.
Keywords
Local polynomial regression; Ranked set sample; Median ranked set sample; Mean square error; Cumulative distribution function

Full Text
Introduction Cumulative Distribution Function CDF is a strong tool for understanding and evaluating random variables, as well as forecasting future occurrences; it is a foundational idea in probability theory and statistics. Also, establish that the occurrence is likely to take place until a specific point. When we utilize it, we encounter several obstacles. One of the challenges is studying nonparametric analysis in-depth and identifying several population features, such as odds, survival analysis, hazards, etc. Estimating the CDF using ranked sets sample RSS is more efficient than basic random sample SRS because RSS frequently yields more precise estimates with the same sample size, as well studying variables and measuring them is not easy; sometimes it is too expensive or time-consuming, but ranking the variables is easy or has a negligible cost. The first to introduce the rank-set sample was McIntyre (1952) for estimating the paster of yields in Australia and expressed expectations about how to develop the estimator that would be more effective for the paster of yields. Halls and Dell (1966) conducted a field trial evaluating its applicability to the estimation of forage yields in a pine-hardwood forest, the terminology ranked set sampling was, coined by Halls and Dell. Takahasi and Wakimoto (1968) proved the first theoretical result is that, when ranking is perfect, the ranked set sample mean is an unbiased estimator of the population mean, and the variance of the ranked set sample mean is always smaller than the variance of the mean of a simple random sample of the same size. Research has continued in ranked sets since (1997). Muttlak (1997) suggested studying median ranked sets sampling to estimate the population mean instead of ranked sets sampling, and it is a strategy to minimize the error in ranking. Gulati (2004) studied the empirical distribution function of Stokes and Sagar with smooth estimators and properties using simulation to compare the smooth and empirical estimators. Frey (2012) derived the constraint to estimate the cumulative distribution function with the mean of the population to create a Woodruff-type confidence interval for the population quantile. Al-Saleh and Ahmad (2019) suggested a new technique of ranked sets sampling, which was called Moving extreme ranked sets sampling, to estimate the cumulative distribution function and then compared the proposed estimator with the corresponding estimator based on both. Zamanzade (2020) established two estimators in moving extreme ranked sets sampling with simulation and also showed that the proposed estimators provide a substantial improvement over their competitors and prove that the estimators are utilized to estimate the stress-strength probability. Abdallah and Al-Omari (2022) considered the problem of estimating the cumulative distribution function and the odds measure under moving extreme ranked set sampling. The paper is structured as follows: Section 2 describes ranked sets sampling and median ranked sets sampling. Section 3 describes local polynomial regression. Section 4 Estimation of cumulative distribution function using the Maximum Likelihood Method, Method of Moments, and local polynomial regression. Section 5 Simulation study and conclusions. Description RSS & MRSS 2-1. Methodology of ranked sets sample (RSS) McIntyre (1952) was the first one who suggested the ranked sets sampling as a strategy to estimate the paster of yields. In the technique, taking samples is much cheaper than measurement of the variable. We will describe how to select ranked sets sampling in the following steps: Draw randomly sample units from the population of interest. Divided the units into groups each one of the groups has a size of . Based on the judgment rank the unit sets without actual measurement by eyes or by a bit price method for the variable of interest. From the first set, select the smallest order observation, discard the other units, and then from the second set, select the second smallest order observation, and discard the other units. The process continues until get the maximum order observation. From steps 1-3 we can get the ranked sets sample RSS of one group To obtain with size , we can repeat steps (1-3) times, where . Let be the element set, be the judgment order statistics of the sample of size , and the cycle of the repeated. You should notice that we utilize square brackets [.], when the ranked sets sample is imperfect ranking it means there is an error in ranking, if there is no error in ranking it means that the ranked sets sample is perfect ranking we use the round brackets (.), it’s very important to note that for each are independent and identically distributed . And for each are just independent. 2-2. Methodology of median ranked sets sample ( ) Muttlak (1997) suggested a new strategy of ranked sets sampling, which is called median ranked sets sampling , to minimize errors in the process of ranking units within groups and to increase the efficiency of the estimator in the presence of errors in ranking, also to increase the efficiency over with perfect ranking. The following summarizes the procedure for drawing a sample of size . We will describe how to select median ranked sets sampling in the following steps: Drow randomly sample units from the population of interest. Divided the units into groups each one of the groups has a size of . If the sample size of group is odd, the odd will be measured by rule it is equal to the units in the medial of the groups, if the sample size of the group is even, the even will be measured the first half group units with rule and the second half with rule . Steps 1-3 can be replayed times, if necessary to get of size . If the groups are odd, the median ranked sets sample odd, symbolized , where the units of the for variable , is described as follows: Be the judgment order statistics of the sample of size and the cycle of the repeated. If the groups are even, the median ranked sets sample even, symbolized , Also is the units of the for variable , is described as follows: Where be the judgment order statistics of the sample of size and the cycle of the repeated. Local polynomial regression (LPR) Local polynomial regression is a nonparametric technique used to generalize kernel regression, also used to model functions and smoothing one of the statistics plots, which is called the scatter plot. One of the most important uses is to find the relationship between the dependent variable and the independent variable, is better than other types of regression for having a good performance near the boundary. For each point of a which is low order weighted least square regression is fit at each point of . By the Fan and Gijbels (1996), ( are defined according to the fixed model in equation (1). (1) Where , is the variance of at point , is a residual error with normal distribution with mean zero and variance . For estimating we use a Tylor series. (2) We need the point in the area of z because it gives us a higher weight than the other point remaining, we can estimate the unknown parameters in equation (2) by using weighted least square, depending on the following formula: and represents kernel function, represents bandwidth. represents the diagonal elements matrix of weight, where is the with in the first entry and elsewhere. Estimation of cumulative distribution function (CDF) 4.1. Estimation of (CDF) using The Maximum likelihood Method (MLE): 4.1.1. Based on ranked sets sampling (RSS). Al-Saleh and Ahmad (2019) suggested and proved using based on the represent RSS that we selected from the population with and , and then we will use the maximum likelihood estimation for estimating depending on the . They assumed that the variable is distributed according to a binomial distribution with mass parameter and success probability . Therefore, the estimator of the probability distribution function is defined according to the following relationship: is the mean obtained by estimating the based on by using . 4.1.2. Based on odd median ranked sets sampling ( ). Let , be median ranked sets sample odd of size , that we selected from the population with and , and then we will use the maximum likelihood estimation for estimating depending on the , note that for each , are independent and identically distributed each unit distributed Bernoulli distribution with probability of success , and represent indicator. Let , then variable distributed binomial with mass parameter and success probability . The likelihood function is determined according to equation (4): Therefore, the estimator of the probability distribution function is defined according to the following relationship: is the mean obtained by estimating the based on by using . 4.1.3. Based on even median ranked sets sampling ( ). The element set is of size , that we selected from the population with and , and then we will use the maximum likelihood estimation for estimating depending on the , depending on the withdrawal method of median ranked sets, sample even note that for each are independent and identically distributed each unit distributed Bernoulli distribution with a probability of success , and represent indicator. Let , then variable distributed binomial with mass parameter and success probability . The likelihood function is determined according to equation (6): Therefore, the estimator of the probability distribution function is defined according to the following relationship: is the mean obtained by estimating the based on by using .
References
REFERENCES Al-Saleh, M. F., & Ahmad, D. M. R. (2019). Estimation of the distribution function using moving extreme ranked set sampling (MERSS). In Ranked Set Sampling (pp. 43-58). Academic Press. Abdallah, M. S., & Al-Omari, A. I. (2022). ON THE NONPARAMETRIC ESTIMATION OF THE ODDS AND DISTRIBUTION FUNCTION USING MOVING EXTREME RANKED SET SAMPLING. Investigación Operacional, 43(1), 90-102. AL_Rahman, R., & Mohammad, S. (2022). Generalized ratio-cum-product type exponential estimation of the population mean in median ranked set sampling. Iraqi Journal of Statistical Sciences, 19(1), 54-66. Frey, J. (2012). Constrained nonparametric estimation of the mean and the CDF using ranked-set sampling with a covariate. Annals of the Institute of Statistical Mathematics, 64(2), 439-456. Fan, J., Gijbels, I., Hu, T. C., & Huang, L. S. (1996). A study of variable bandwidth selection for local polynomial regression. Statistica Sinica, 113-127. Gulati, S. (2004). Smooth non‐parametric estimation of the distribution function from balanced ranked set samples. Environmetrics, 15(5), 529-539. HA, M. (1997). Median ranked set sampling. J Appl Stat Sci, 6, 245-255. Halls, L. K., & Dell, T. R. (1966). Trial of ranked-set sampling for forage yields. Forest Science, 12(1), 22-26. McIntyre, G. A. (1952). A method for unbiased selective sampling, using ranked sets. Australian journal of agricultural research, 3(4), 385-390. Stokes, S. L., & Sager, T. W. (1988). Characterization of a ranked-set sample with application to estimating distribution functions. Journal of the American Statistical Association, 83(402), 374-381. Takahasi, K., & Wakimoto, K. (1968). On unbiased estimates of the population mean based on the sample stratified by means of ordering. Annals of the institute of statistical mathematics, 20(1), 1-31. Zamanzade, E., Mahdizadeh, M., & Samawi, H. M. (2020). Efficient estimation of cumulative distribution function using moving extreme ranked set sampling with application to reliability. AstA Advances in Statistical Analysis, 104(3), 485-502.
Statistics Article View: 103 PDF Download: 55