A random variable is always considered as a sample from a distribution. This may be well-known distribution or not. Some random variables are drawn from one single distribution , such as the normal distribution but this is not always so easy because in real-life the random variables might have been generated from a mixture of several distributions.
In studying mixture distributions the formula of this distribution have been difficult then it is used some algorithms to facilitate finding the estimators , where EM algorithm is used to find the maximum likelihood estimators and the metropolis Hastings algorithm to find the Bayesian estimators . if the distribution is an exponential family , with density ,then a conjugate prior distribution for exists and the prior distribution is conjugate to the likelihood of the exponential family , see (Bernardo,2009).
Many authors considered estimating the parameters of the mixture distributions. For example, (Newcomb ,1886) suggested an iterative reweighting scheme that can be viewed as an application of the EM algorithm of (Dempster et al. ,1977) to compute the common mean of a mixture in known proportions of a finite number of univariate normal distributions with known variances. (Jewell ,1982) provided maximum likelihood estimates of mixture of exponential distributions using EM algorithm..( Li L.A., 1983) quoted several features of mixture models and defined two types of mixture models. If the component distributions of a mixture belong to same family, their mixture is known as a type-I mixture model. Whereas, a type-II mixture model is defined as the component distributions of a mixture belong to different families .. (Upadhyay et. al. ,2002) proposed Bayesian inference in life testing and reliability by using Markov Chain Monte Carlo (MCMC). (Pang et. al. ,2004) used MCMC techniques to carry out a Bayesian estimation procedure using Hirose’s simulated data. (Chojogh,B,et al,2019) presented a research in which he clarified mixture distributions the research include model of the normal mixture distribution and Poisson mixture distribution for tow component and for k-components and estimating the parameters of these model using (EM) algorithm. (“A mixture model for determining SARS-COV-2 variant composition in pooled samples”) presented a research includes a mixture model distributions and apply it to a set of variables SARS-COV-2 the model is built by looking at a pre-defined set of data ,the results showed that these models support these data well.
Gamma Distribution
It is a type of continuous probability distribution and is used in many fields such as Statistics, Economics, Physics, Computer Science and others, the Gamma distribution can be determined by two parameters, the shape parameter (α) and the scale parameter (β), and the probability density function (pdf) for this distribution is as follows: -
(1)
where α > 0 , β > 0 and x > 0.
1- Mixture Distribution Models
It is the process of analyzing data to determine the best mixture model that can be used to describe the observed data. Mixture models consist of several different probability distributions and are characterized by their ability to represent the distribution of data more accurately than single models.
Every random variable can be considered as a sample from a distribution, . Some random variables are drawn from one single distribution, such as a normal distribution. But life is not always so easy! Most of real-life random variables might have been generated from a mixture of several distributions and not a single distribution.
Random variables usually come from only one distribution, like (gamma distribution or normal distribution), but in real life there are some variables that come from several mixture distributions and these distribution may be from the same family, i.e. from one family, for example, all of them from the normal distribution, but with different parameters, or these distributions may be different, for example (gamma distribution and normal distribution) together.
Let 𝑋1, 𝑋2, 𝑋3, … , 𝑋𝑛 be independent random variables and 𝑥1, 𝑥2, 𝑥3, … , 𝑥𝑛 the observations of the random variable and the probability density function for the mixture distribution (pdf) containing k of the components can be expressed as follows:-
(2)
where 𝜆𝑗 represents the mixture weights and is 0 < 𝜆𝑗 < 1 and and 𝑓𝑗(𝑥|𝜃𝑗) represents the probability density function of the variable ( x) and = (𝜃1, 𝜃2, … 𝜃𝑘) represents the parameters vector of the mixture distribution, and it is worth noting that the parameter θ is treated as a random variable rather than a constant (Tahir & et al, 2016). The mixture gamma distribution of k of components is written as follows:-
(3)
, , , .
2- SOME METHODS OF ESTIMATE THE PARAMETERS OF MIXTURE DISTRBUTION
Mixture distributions are common statistical distributions, which are used in many fields such as data analysis, machine learning, and others, and these distributions depend on the idea of collecting several simple distributions together to produce a complex distribution. And these distributions need to estimate a set of parameters that determine the distribution of mixture data.
When we have a sample size n (𝑥1, 𝑥3, 𝑥2, … 𝑥𝑛) are randomly drawn from a known distribution but the distribution parameters are unknown, for example a sample drawn from the normal distribution with unknown parameters (mean and variance), the main objective is to estimate the parameters of this distribution. In this study, we will discuss two methods for estimating parameters of mixture distribution.
A- Maximum Likelihood Estimation (MLE):
This method is one of the most important methods of point estimation and was proposed by the famous statistician Fisher in 1920, as it assumes that the parameters to be estimated for a particular population is an unknown fixed quantity which estimated based on the sample data.
Assume we have a sample with size n, i.e., (𝑥1, 𝑥3, 𝑥2, … 𝑥𝑛) Also assume that we know the distribution from which this
sample has been randomly drawn but we do not know the parameters of that distribution. The principle of this method is to find an estimate such as 𝜃̂ for the parameter θ which makes the likelihood function at its maximum value. If 𝑥1, 𝑥3, 𝑥2, … 𝑥𝑛 are random variables and these variables have an independent and identically distributed (iid) and size (n) and drawn from a population with a probability density function 𝑓(𝑥|𝜃), the estimator of the likelihood function that makes the likelihood function at its maximum value can be obtained by deriving the likelihood function and equating it to zero. The likelihood function will be as follows:-
(4)
by used (2)
(5)
by given log:
(6)
Then we take the partial derivative of 𝜃𝑗 once and for 𝜆𝑗 again to get the equation for each parameter, but it will be difficult to solve the equations formed directly because of the presence of addition operations inside the logarithm, so it is necessary to rely on numerical methods and algorithms that use iterative operations in order to reach the maximum likelihood estimator (Friedman & et al, 2009).
Expectation Maximization Algorithm (EM):
The expectation maximization algorithm (EM) was proposed by (Dempster, Laird & Rubin ,1977) and still to this day, and it is one of the most important methods to find the maximum likelihood estimators in the case of latent variables or missing values. And this algorithm is used in statistics and machine learning to solve problems related to statistical analysis of data such as classification, aggregation and factor analysis (Filho, 2008).
For example, assuming the collection of data about a particular disease, where the severity of the disease was not recorded, but the presence or absence of the disease was recorded, i.e. the absence of the disease was expressed by zero, and the presence of the disease was expressed in
x> 0, in this case we do not know the values of x, is it 100 or 5 ?, in this case, we cannot use the method of maximum likelihood because there are missing values.
The expectation maximization algorithm consists of two steps (Chris & Raftery, 2017):
a-Step One: E-Step
This step aims to estimate probability distributions by taking the expectation of the logarithm of the likelihood function in order to find an appropriate estimate of the parameters.
here the missing values are treated as constants and not variables (Chojogh et al, 2019).
b-Step Two: M-Step
This step aims to determine the optimal values of the parameters using the expectation function in the first step.
To estimate mixture Gamma distribution we have the p.d.f of mixture Gamma distributions
by used (3)
Taking the logarithm to the above equation we get
(7)
Optimizing this log-likelihood is difficult because of the summation within the logarithm. However, we can use the indicator parameter for each observation as follows (Corduneanu and Bishop, 2001).
And the probability is:
For fixed i and , the probability density function for as the following form:
(Saeed,2005)
Since Are independent, we write the joint indicator density as the following form:
(8)
where denoted the complement data. Therefore we can write the joint pdf of the observation and the indicator as following form:
(9)
and the complement data likelihood is given by:
(10)
(11)
Where
The log of the complement data likelihood function is
(12)
Theis latent or missing value because we do not know whether it be or therefore we used the Expectation Maximization(EM) to estimate the parameters (Sattayatham and Talangtam, 2012).
Case 1:E-Step
= (13)
(14)
The expected complete log-likelihood is
(15)
(16)
Case 2:M-SteP
(17)
(18)
(19)
We solve this equation by Newton’s Raphson method
(20)
2-Bayesian Estimation Approach
In many cases, it is easy to find a suitable formula for the posterior distribution, but sometimes we may face difficulties in finding posterior distributions, which may require the integration of high- dimensional functions (high-grade functions), so it was necessary to develop methods that facilitate the process of finding posterior distributions and solve this problem, and the most important of these methods is the Markov Chain Monte Carlo (MCMC) where this method was used by researchers in the early 1990s and was widely applied to solve Bayes' problems as it relies on the idea of obtaining a random sample of conditional distributions of parameters .
The most commonly used methods of the Markov Chain Monte Carlo (MCMC) are the Gibbs Sampling Algorithm and the Metropolis-Hastings Algorithm, which we will use in this paper.
Metropolis - Hastings Algorithm
The Metropolis-Hastings algorithm is one of the main methods of the (MCMC) the main methods to estimate the parameters of mixture distributions and is used in many scientific and engineering applications, especially in the fields of Statistics and Physics.
Let 𝑥1, 𝑥3, 𝑥2, … 𝑥𝑛 be identically distributed random (iid) and have a probability density function 𝑓(𝑥|𝜃) and we do not know the posterior distribution of the parameters of this function and suppose that 𝑞(𝜃|𝜃′) is a candidate distribution with the parameter θ', the steps of this algorithm are: (al-masri,2020)
Metropolis-Hastings Algorithm Steps:
1-Choose an initial value for the parameter 𝜃(0) so that it is close to the parameter values of the real data.
2-Choose the default sample sizes for random variable observations x
3-We make a repetition from i=1,2,…,N
- We generate a suggested value θ ́ followed the proposed distribution (we use the prior distribution for each model).
- We calculate the acceptance probability:
where the numerator represents the value of the proposed parameter compensated in the equation for a conditional distribution. The denominator represents the value estimated by the equation of a conditional distribution..
c- Generate random numbers ui of uniform (0,1).
- If ui < α(θ i−1,θ ′) , we assume that 𝜃𝑖= 𝜃′ and if ui≥ α(θ i−1,θ ′) , we assume that 𝜃𝑖 = 𝜃𝑖−1
4- We repeat the previous steps each time by making i=i +1 and go to step 1.1-Posterior
When the indicator parameter zi is unknown, for all observation xi, i = 1, 2, . . . , n and the scale parameter a, the shape parameters and the weight parameter λ are known. The conjugate prior p () of is multinomial with hyper parameters (1, λ1, λ2, . . . , λk).
By using Bayes’ theorem, the posterior distribution:
(21)
Since each takes two values only 1 or 0, then
(22)
Therefore, the posterior distribution has a multinomial distribution (1, wi1, wi2, . . . , wik), where i = 1, 2, . . . , n and j = 1, 2, . . . , k.
2- Posterior
When the weight parameter is unknown and the scale parameter a and the shape parameters are known. By ignoring terms that contain in (11) the complete data likelihood function is given by:
(23)
Where is the number of the observations
By using (13)
(24)
The conjugate prior p (λ) is a Dirichlet distribution with hyperparameters µ = (µ1, µ2, . . . , µk) is given
,
By ignoring terms that contain the posterior distribution is a Dirchlet with hyperparameters ( is given by
(25)
3-aj Posterior
When the shape parameter aj is unknown, for some j = 1, 2, . . . , k and both the weight parameter λ and the scale parameters are known. By ignoring terms that contain . . , aj−1, aj+1, . . . , ak in (11), the complete data likelihood function is given by:
(26)
Where
The conjugate prior p (aj) is an exponential family with hyper parameters ( is given by
(27)
The posterior distribution with hyper parameters is given by
(28)
4-j Posterior
When the scale parameter is unknown, for some j = 1, 2, . . . , k and the shape parameter a, and the weight parameter λ are known. By ignoring terms that contain in (11), the complete data likelihood function is given by:
The conjugate prior p () is the gamma distribution with hyper parameters ()
(29)
The posterior distribution is the gamma distribution with hyper parameters ()
(30)
Where
5-Joint Posterior of
When the weight parameter λ is known and the shape parameters and the scale parameter are unknown. By ignoring terms that contain λ in (11), the complete data likelihood function is given by:
The conjugate prior p () with hyper parameters () is given by
(31)
By ignoring terms that contain λ, the joint posterior distribution
(32)
with the hyper parameters
In this section, a simulation study using Monte Carlo methods in Bayesian method of estimation and EM algorithm in maximum likelihood estimation and compare the efficiency of MLE method with Bayesian method of estimation using by computing the mean of the sum of the modulus of the bias (MBias), and the root-mean square error (RMSE),
The general form of tow-component mixture gamma distribution is given by
The simulation study was written using R language. The simulation study included the following basic stages:
First stage: choosing the initial vales as follows:
1-choosing the initial values for the parameters ( , the and (1-) selected randomly from the first and the second component density.
2- choose different sample size( 50, 100, 150) to generate the data set of tow-component mixture gamma distribution with parameters.
3-Repeat the experiment 1000 repetitions for each experiment.
4-choose values for the random variable.
Second stage: data generation :
A random variable is generated depending on the type of distribution
Third stage: estimating the parameters according to the mixture distributions using the estimation methods.
Fourth stage: the results compare the efficiency of MLE method with Bayesian method of estimation using by computing the mean of the sum of the modulus of the bias (MBias), and the root-mean square error (RMSE), where the smaller RMSE and MBias indicates a better overall quality of the estimates.
To find the MLE estimators, the Newton Raphson method was adopted. The parameters () are estimated with Metropolis method (MT) of estimation using the joint prior in (31) with hyperparameters (s = 1; m = 1; t = 1) where the simulation study was carried out 1000 times. Table 1 present the estimates (Est.) and the RMSE and MBias values by MLE and MT method. The smaller RMSE and MBias for each sample size is highlighted in bold . Looking at these tables we observe that: we obtained that Metropolis method is uniformly better than MLE in all cases.
Table 1: MBias and RMSE of the MLE estimates and the MT estimators for two component mixture Gamma distribution
|
Sample size
|
Method
|
|
|
|
|
|
RMSE
|
MBise
|
|
50
|
EM
|
5.2594
|
2.3304
|
9.1590
|
3.8302
|
0.4065
|
2.3633
|
2.0702
|
|
McMc
|
2.5164
|
3.3312
|
4.1121
|
5.8485
|
0.5114
|
1.0555
|
0.8401
|
|
100
|
EM
|
2.8220
|
6.3777
|
5.8096
|
10.8190
|
0.4999
|
2.0152
|
1.3130
|
|
McMc
|
2.6802
|
6.2251
|
5.5243
|
10.6079
|
0.5091
|
1.9129
|
1.3075
|
|
150
|
EM
|
5.2594
|
2.3304
|
9.1590
|
3.8302
|
0.4065
|
2.3633
|
2.0702
|
|
McMc
|
2.5164
|
3.3312
|
4.1121
|
5.8485
|
0.5114
|
1.0555
|
0.8401
|
4- Discussion
The parameters are estimated with Metropolis method and the Expectation Maximization algorithm(EM) from the simulation results, it is observed that Bayes estimator better than maximum likelihood estimation in all cases