- Introduction
Cumulative Distribution Function CDF is a strong tool for understanding and evaluating random variables, as well as forecasting future occurrences; it is a foundational idea in probability theory and statistics. Also, establish that the occurrence is likely to take place until a specific point. When we utilize it, we encounter several obstacles. One of the challenges is studying nonparametric analysis in-depth and identifying several population features, such as odds, survival analysis, hazards, etc. Estimating the CDF using ranked sets sample RSS is more efficient than basic random sample SRS because RSS frequently yields more precise estimates with the same sample size, as well studying variables and measuring them is not easy; sometimes it is too expensive or time-consuming, but ranking the variables is easy or has a negligible cost. The first to introduce the rank-set sample was McIntyre (1952) for estimating the paster of yields in Australia and expressed expectations about how to develop the estimator that would be more effective for the paster of yields. Halls and Dell (1966) conducted a field trial evaluating its applicability to the estimation of forage yields in a pine-hardwood forest, the terminology ranked set sampling was, coined by Halls and Dell. Takahasi and Wakimoto (1968) proved the first theoretical result is that, when ranking is perfect, the ranked set sample mean is an unbiased estimator of the population mean, and the variance of the ranked set sample mean is always smaller than the variance of the mean of a simple random sample of the same size. Research has continued in ranked sets since (1997). Muttlak (1997) suggested studying median ranked sets sampling to estimate the population mean instead of ranked sets sampling, and it is a strategy to minimize the error in ranking. Gulati (2004) studied the empirical distribution function of Stokes and Sagar with smooth estimators and properties using simulation to compare the smooth and empirical estimators. Frey (2012) derived the constraint to estimate the cumulative distribution function with the mean of the population to create a Woodruff-type confidence interval for the population quantile. Al-Saleh and Ahmad (2019) suggested a new technique of ranked sets sampling, which was called Moving extreme ranked sets sampling, to estimate the cumulative distribution function and then compared the proposed estimator with the corresponding estimator based on both. Zamanzade (2020) established two estimators in moving extreme ranked sets sampling with simulation and also showed that the proposed estimators provide a substantial improvement over their competitors and prove that the estimators are utilized to estimate the stress-strength probability. Abdallah and Al-Omari (2022) considered the problem of estimating the cumulative distribution function and the odds measure under moving extreme ranked set sampling.
The paper is structured as follows: Section 2 describes ranked sets sampling and median ranked sets sampling. Section 3 describes local polynomial regression. Section 4 Estimation of cumulative distribution function using the Maximum Likelihood Method, Method of Moments, and local polynomial regression. Section 5 Simulation study and conclusions.
- Description RSS & MRSS
2-1. Methodology of ranked sets sample (RSS)
McIntyre (1952) was the first one who suggested the ranked sets sampling as a strategy to estimate the paster of yields. In the technique, taking samples is much cheaper than measurement of the variable. We will describe how to select ranked sets sampling in the following steps:
- Draw randomly sample units from the population of interest. Divided the units into groups each one of the groups has a size of .
- Based on the judgment rank the unit sets without actual measurement by eyes or by a bit price method for the variable of interest.
- From the first set, select the smallest order observation, discard the other units, and then from the second set, select the second smallest order observation, and discard the other units. The process continues until get the maximum order observation.
- From steps 1-3 we can get the ranked sets sample RSS of one group
- To obtain with size , we can repeat steps (1-3) times, where .
Let be the element set, be the judgment order statistics of the sample of size , and the cycle of the repeated. You should notice that we utilize square brackets [.], when the ranked sets sample is imperfect ranking it means there is an error in ranking, if there is no error in ranking it means that the ranked sets sample is perfect ranking we use the round brackets (.), it’s very important to note that for each are independent and identically distributed . And for each are just independent.
2-2. Methodology of median ranked sets sample ( )
Muttlak (1997) suggested a new strategy of ranked sets sampling, which is called median ranked sets sampling , to minimize errors in the process of ranking units within groups and to increase the efficiency of the estimator in the presence of errors in ranking, also to increase the efficiency over with perfect ranking. The following summarizes the procedure for drawing a sample of size . We will describe how to select median ranked sets sampling in the following steps:
- Drow randomly sample units from the population of interest.
- Divided the units into groups each one of the groups has a size of .
- If the sample size of group is odd, the odd will be measured by rule it is equal to the units in the medial of the groups, if the sample size of the group is even, the even will be measured the first half group units with rule and the second half with rule .
- Steps 1-3 can be replayed times, if necessary to get of size .
If the groups are odd, the median ranked sets sample odd, symbolized , where the units of the for variable , is described as follows:
Be the judgment order statistics of the sample of size and the cycle of the repeated.
If the groups are even, the median ranked sets sample even, symbolized , Also is the units of the for variable , is described as follows:
Where be the judgment order statistics of the sample of size and the cycle of the repeated.
- Local polynomial regression (LPR)
Local polynomial regression is a nonparametric technique used to generalize kernel regression, also used to model functions and smoothing one of the statistics plots, which is called the scatter plot. One of the most important uses is to find the relationship between the dependent variable and the independent variable, is better than other types of regression for having a good performance near the boundary. For each point of a which is low order weighted least square regression is fit at each point of . By the Fan and Gijbels (1996), ( are defined according to the fixed model in equation (1).
(1)
Where , is the variance of at point , is a residual error with normal distribution with mean zero and variance . For estimating we use a Tylor series.
(2)
We need the point in the area of z because it gives us a higher weight than the other point remaining, we can estimate the unknown parameters in equation (2) by using weighted least square, depending on the following formula:
and
represents kernel function, represents bandwidth. represents the diagonal elements matrix of weight, where is the with in the first entry and elsewhere.
- Estimation of cumulative distribution function (CDF)
4.1. Estimation of (CDF) using The Maximum likelihood Method (MLE):
4.1.1. Based on ranked sets sampling (RSS).
Al-Saleh and Ahmad (2019) suggested and proved using based on
the represent RSS that we selected from the population with and , and then we will use the maximum likelihood estimation for estimating depending on the . They assumed that the variable is distributed according to a binomial distribution with mass parameter and success probability . Therefore, the estimator of the probability distribution function is defined according to the following relationship:
is the mean obtained by estimating the based on by using .
4.1.2. Based on odd median ranked sets sampling ( ).
Let , be median ranked sets sample odd of size , that we selected from the population with and , and then we will use the maximum likelihood estimation for estimating depending on the , note that for each , are independent and identically distributed each unit distributed Bernoulli distribution with probability of success , and represent indicator.
Let , then variable distributed binomial with mass parameter and success probability .
The likelihood function is determined according to equation (4):
Therefore, the estimator of the probability distribution function is defined according to the following relationship:
is the mean obtained by estimating the based on by using .
4.1.3. Based on even median ranked sets sampling ( ).
The element set is of size , that we selected from the population with and , and then we will use the maximum likelihood estimation for estimating depending on the , depending on the withdrawal method of median ranked sets, sample even note that for each
are independent and identically distributed each unit distributed Bernoulli distribution with a probability of success , and represent indicator.
Let , then variable distributed binomial with mass parameter and success probability .
The likelihood function is determined according to equation (6):
Therefore, the estimator of the probability distribution function is defined according to the following relationship:
is the mean obtained by estimating the based on by using .