Estimating the Parameters of Mixture Gamma Distributions Using Maximum Likelihood and Bayesian Method

Ibrahim Abdulla Najm, Nagham; Salim Al_Rassam, Raya

doi:10.33899/iqjoss.2024.183254

Journals List

Estimating the Parameters of Mixture Gamma Distributions Using Maximum Likelihood and Bayesian Method

IRAQI JOURNAL OF STATISTICAL SCIENCES

Volume 21, Issue 1, June 2024, Pages 137-149 PDF (329.87 K)

Document Type: Research Paper

DOI: 10.33899/iqjoss.2024.183254

Authors

Nagham Ibrahim Abdulla Najm^* ¹; Raya Salim Al_Rassam²

¹Nagham Ibrahim Abdulla Najm Department of Statistics and Informatics, College of Computer Sciences and Mathematics, University of Mosul, Mosul, Iraq

²Department of Statistics and Informatics, College of Computer Sciences and Mathematics, University of Mosul, Mosul, Iraq

Abstract

This paper focuses on the mixture Gamma distribution and uses the maximum likelihood and Bayesian techniques to estimate its parameters. This study uses Expectation Maximization Algorithm (EM) to find the maximum likelihood estimators and the random Metropolis-Hastings algorithm is used to simulate the Bayesian estimates of the parameters of mixture gamma distribution. then these estimates are compared by using the sum of the modulus of the bias (MBias), and the root-mean square error (RMSE). It has been shown that the Bayesian estimator is better than the maximum likelihood estimator.

Highlights

Mixture distributions ( in the case of similar and different components) the distributions formula becomes more complex, to make it easier to find a maximum likelihood estimator.it uses the EM algorithm . and MT algorithm to find the Bayesian estimators estimator in all cases.
After creating the simulation by taking different sample sizes (50,100,150)and using comparison criteria RMSE and MBias show that the Bayesian estimators is the best.

1- Acknowledgment

The authors are sincerely grateful to the University of Mosul and College of Computer Sciences and Mathematics for their provided facilities, which helped me very much to improve this work's quality.

Conflict of interest

The authors have no conflict of interest.

Keywords

Gamma distribution; Mixture distribution; Bayesian estimation; Likelihood function; Expectation Maximization Algorithm; Metropolis-Hastings

Full Text

A random variable is always considered as a sample from a distribution. This may be well-known distribution or not. Some random variables are drawn from one single distribution , such as the normal distribution but this is not always so easy because in real-life the random variables might have been generated from a mixture of several distributions.

In studying mixture distributions the formula of this distribution have been difficult then it is used some algorithms to facilitate finding the estimators , where EM algorithm is used to find the maximum likelihood estimators and the metropolis Hastings algorithm to find the Bayesian estimators . if the distribution is an exponential family , with density ,then a conjugate prior distribution for exists and the prior distribution is conjugate to the likelihood of the exponential family , see (Bernardo,2009).

Many authors considered estimating the parameters of the mixture distributions. For example, (Newcomb ,1886) suggested an iterative reweighting scheme that can be viewed as an application of the EM algorithm of (Dempster et al. ,1977) to compute the common mean of a mixture in known proportions of a finite number of univariate normal distributions with known variances. (Jewell ,1982) provided maximum likelihood estimates of mixture of exponential distributions using EM algorithm..( Li L.A., 1983) quoted several features of mixture models and defined two types of mixture models. If the component distributions of a mixture belong to same family, their mixture is known as a type-I mixture model. Whereas, a type-II mixture model is defined as the component distributions of a mixture belong to different families .. (Upadhyay et. al. ,2002) proposed Bayesian inference in life testing and reliability by using Markov Chain Monte Carlo (MCMC). (Pang et. al. ,2004) used MCMC techniques to carry out a Bayesian estimation procedure using Hirose’s simulated data. (Chojogh,B,et al,2019) presented a research in which he clarified mixture distributions the research include model of the normal mixture distribution and Poisson mixture distribution for tow component and for k-components and estimating the parameters of these model using (EM) algorithm. (“A mixture model for determining SARS-COV-2 variant composition in pooled samples”) presented a research includes a mixture model distributions and apply it to a set of variables SARS-COV-2 the model is built by looking at a pre-defined set of data ,the results showed that these models support these data well.

Gamma Distribution

It is a type of continuous probability distribution and is used in many fields such as Statistics, Economics, Physics, Computer Science and others, the Gamma distribution can be determined by two parameters, the shape parameter (α) and the scale parameter (β), and the probability density function (pdf) for this distribution is as follows: -

(1)

where α > 0 , β > 0 and x > 0.

1- Mixture Distribution Models

It is the process of analyzing data to determine the best mixture model that can be used to describe the observed data. Mixture models consist of several different probability distributions and are characterized by their ability to represent the distribution of data more accurately than single models.

Every random variable can be considered as a sample from a distribution, . Some random variables are drawn from one single distribution, such as a normal distribution. But life is not always so easy! Most of real-life random variables might have been generated from a mixture of several distributions and not a single distribution.

Random variables usually come from only one distribution, like (gamma distribution or normal distribution), but in real life there are some variables that come from several mixture distributions and these distribution may be from the same family, i.e. from one family, for example, all of them from the normal distribution, but with different parameters, or these distributions may be different, for example (gamma distribution and normal distribution) together.

Let 𝑋₁, 𝑋₂, 𝑋₃, … , 𝑋_𝑛 be independent random variables and 𝑥₁, 𝑥₂, 𝑥₃, … , 𝑥_𝑛 the observations of the random variable and the probability density function for the mixture distribution (pdf) containing k of the components can be expressed as follows:-

(2)

where 𝜆_𝑗 represents the mixture weights and is 0 < 𝜆_𝑗 < 1 and and 𝑓_𝑗(𝑥|𝜃_𝑗) represents the probability density function of the variable ( x) and = (𝜃₁, 𝜃₂, … 𝜃_𝑘) represents the parameters vector of the mixture distribution, and it is worth noting that the parameter θ is treated as a random variable rather than a constant (Tahir & et al, 2016). The mixture gamma distribution of k of components is written as follows:-

(3)

, , , .

2- SOME METHODS OF ESTIMATE THE PARAMETERS OF MIXTURE DISTRBUTION

Mixture distributions are common statistical distributions, which are used in many fields such as data analysis, machine learning, and others, and these distributions depend on the idea of collecting several simple distributions together to produce a complex distribution. And these distributions need to estimate a set of parameters that determine the distribution of mixture data.

When we have a sample size n (𝑥₁, 𝑥₃, 𝑥₂, … 𝑥_𝑛) are randomly drawn from a known distribution but the distribution parameters are unknown, for example a sample drawn from the normal distribution with unknown parameters (mean and variance), the main objective is to estimate the parameters of this distribution. In this study, we will discuss two methods for estimating parameters of mixture distribution.

A- Maximum Likelihood Estimation (MLE):

This method is one of the most important methods of point estimation and was proposed by the famous statistician Fisher in 1920, as it assumes that the parameters to be estimated for a particular population is an unknown fixed quantity which estimated based on the sample data.

Assume we have a sample with size n, i.e., (𝑥₁, 𝑥₃, 𝑥₂, … 𝑥_𝑛) Also assume that we know the distribution from which this

sample has been randomly drawn but we do not know the parameters of that distribution. The principle of this method is to find an estimate such as 𝜃^̂for the parameter θ which makes the likelihood function at its maximum value. If 𝑥₁, 𝑥₃, 𝑥₂, … 𝑥_𝑛 are random variables and these variables have an independent and identically distributed (iid) and size (n) and drawn from a population with a probability density function 𝑓(𝑥|𝜃), the estimator of the likelihood function that makes the likelihood function at its maximum value can be obtained by deriving the likelihood function and equating it to zero. The likelihood function will be as follows:-

(4)

by used (2)

(5)

by given log:

(6)

Then we take the partial derivative of 𝜃_𝑗 once and for 𝜆_𝑗 again to get the equation for each parameter, but it will be difficult to solve the equations formed directly because of the presence of addition operations inside the logarithm, so it is necessary to rely on numerical methods and algorithms that use iterative operations in order to reach the maximum likelihood estimator (Friedman & et al, 2009).

Expectation Maximization Algorithm (EM):

The expectation maximization algorithm (EM) was proposed by (Dempster, Laird & Rubin ,1977) and still to this day, and it is one of the most important methods to find the maximum likelihood estimators in the case of latent variables or missing values. And this algorithm is used in statistics and machine learning to solve problems related to statistical analysis of data such as classification, aggregation and factor analysis (Filho, 2008).

For example, assuming the collection of data about a particular disease, where the severity of the disease was not recorded, but the presence or absence of the disease was recorded, i.e. the absence of the disease was expressed by zero, and the presence of the disease was expressed in

x> 0, in this case we do not know the values of x, is it 100 or 5 ?, in this case, we cannot use the method of maximum likelihood because there are missing values.

The expectation maximization algorithm consists of two steps (Chris & Raftery, 2017):

a-Step One: E-Step

This step aims to estimate probability distributions by taking the expectation of the logarithm of the likelihood function in order to find an appropriate estimate of the parameters.

here the missing values are treated as constants and not variables (Chojogh et al, 2019).

b-Step Two: M-Step

This step aims to determine the optimal values of the parameters using the expectation function in the first step.

To estimate mixture Gamma distribution we have the p.d.f of mixture Gamma distributions

by used (3)

Taking the logarithm to the above equation we get

(7)

Optimizing this log-likelihood is difficult because of the summation within the logarithm. However, we can use the indicator parameter for each observation as follows (Corduneanu and Bishop, 2001).

And the probability is:

For fixed i and , the probability density function for as the following form:

(Saeed,2005)

Since Are independent, we write the joint indicator density as the following form:

(8)

where denoted the complement data. Therefore we can write the joint pdf of the observation and the indicator as following form:

(9)

and the complement data likelihood is given by:

(10)

(11)

Where

The log of the complement data likelihood function is

(12)

Theis latent or missing value because we do not know whether it be or therefore we used the Expectation Maximization(EM) to estimate the parameters (Sattayatham and Talangtam, 2012).

Case 1:E-Step

= (13)

(14)

The expected complete log-likelihood is

(15)

(16)

Case 2:M-SteP

(17)

(18)

(19)

We solve this equation by Newton’s Raphson method

(20)

2-Bayesian Estimation Approach

In many cases, it is easy to find a suitable formula for the posterior distribution, but sometimes we may face difficulties in finding posterior distributions, which may require the integration of high- dimensional functions (high-grade functions), so it was necessary to develop methods that facilitate the process of finding posterior distributions and solve this problem, and the most important of these methods is the Markov Chain Monte Carlo (MCMC) where this method was used by researchers in the early 1990s and was widely applied to solve Bayes' problems as it relies on the idea of obtaining a random sample of conditional distributions of parameters .

The most commonly used methods of the Markov Chain Monte Carlo (MCMC) are the Gibbs Sampling Algorithm and the Metropolis-Hastings Algorithm, which we will use in this paper.

Metropolis - Hastings Algorithm

The Metropolis-Hastings algorithm is one of the main methods of the (MCMC) the main methods to estimate the parameters of mixture distributions and is used in many scientific and engineering applications, especially in the fields of Statistics and Physics.

Let 𝑥₁, 𝑥₃, 𝑥₂, … 𝑥_𝑛 be identically distributed random (iid) and have a probability density function 𝑓(𝑥|𝜃) and we do not know the posterior distribution of the parameters of this function and suppose that 𝑞(𝜃|𝜃^′) is a candidate distribution with the parameter θ', the steps of this algorithm are: (al-masri,2020)

Metropolis-Hastings Algorithm Steps:

1-Choose an initial value for the parameter 𝜃⁽⁰⁾ so that it is close to the parameter values of the real data.

2-Choose the default sample sizes for random variable observations x

3-We make a repetition from i=1,2,…,N

We generate a suggested value θ ́ followed the proposed distribution (we use the prior distribution for each model).
We calculate the acceptance probability:

where the numerator represents the value of the proposed parameter compensated in the equation for a conditional distribution. The denominator represents the value estimated by the equation of a conditional distribution..

c- Generate random numbers ui of uniform (0,1).

If u_i < α(θ ⁱ⁻¹,θ ′) , we assume that 𝜃^𝑖= 𝜃^′ and if u_i≥ α(θ ⁱ⁻¹,θ ′) , we assume that 𝜃^𝑖 = 𝜃^𝑖⁻¹

4- We repeat the previous steps each time by making i=i +1 and go to step 1.1-Posterior

When the indicator parameter z_i is unknown, for all observation x_i, i = 1, 2, . . . , n and the scale parameter a, the shape parameters and the weight parameter λ are known. The conjugate prior p () of is multinomial with hyper parameters (1, λ₁, λ₂, . . . , λ_k).

By using Bayes’ theorem, the posterior distribution:

(21)

Since each takes two values only 1 or 0, then

(22)

Therefore, the posterior distribution has a multinomial distribution (1, w_i₁, w_i₂, . . . , w_ik), where i = 1, 2, . . . , n and j = 1, 2, . . . , k.

2- Posterior

When the weight parameter is unknown and the scale parameter a and the shape parameters are known. By ignoring terms that contain in (11) the complete data likelihood function is given by:

(23)

Where is the number of the observations

By using (13)

(24)

The conjugate prior p (λ) is a Dirichlet distribution with hyperparameters µ = (µ₁, µ₂, . . . , µ_k) is given

By ignoring terms that contain the posterior distribution is a Dirchlet with hyperparameters ( is given by

(25)

3-a_j Posterior

When the shape parameter a_j is unknown, for some j = 1, 2, . . . , k and both the weight parameter λ and the scale parameters are known. By ignoring terms that contain . . , a_j₋₁, a_j₊₁, . . . , a_k in (11), the complete data likelihood function is given by:

(26)

Where

The conjugate prior p (a_j) is an exponential family with hyper parameters ( is given by

(27)

The posterior distribution with hyper parameters is given by

(28)

4-_j Posterior

When the scale parameter is unknown, for some j = 1, 2, . . . , k and the shape parameter a, and the weight parameter λ are known. By ignoring terms that contain in (11), the complete data likelihood function is given by:

The conjugate prior p () is the gamma distribution with hyper parameters ()

(29)

The posterior distribution is the gamma distribution with hyper parameters ()

(30)

Where

5-Joint Posterior of

When the weight parameter λ is known and the shape parameters and the scale parameter are unknown. By ignoring terms that contain λ in (11), the complete data likelihood function is given by:

The conjugate prior p () with hyper parameters () is given by

(31)

By ignoring terms that contain λ, the joint posterior distribution

(32)

with the hyper parameters

Simulation study

In this section, a simulation study using Monte Carlo methods in Bayesian method of estimation and EM algorithm in maximum likelihood estimation and compare the efficiency of MLE method with Bayesian method of estimation using by computing the mean of the sum of the modulus of the bias (MBias), and the root-mean square error (RMSE),

The general form of tow-component mixture gamma distribution is given by

The simulation study was written using R language. The simulation study included the following basic stages:

First stage: choosing the initial vales as follows:

1-choosing the initial values for the parameters ( , the and (1-) selected randomly from the first and the second component density.

2- choose different sample size( 50, 100, 150) to generate the data set of tow-component mixture gamma distribution with parameters.

3-Repeat the experiment 1000 repetitions for each experiment.

4-choose values for the random variable.

Second stage: data generation :

A random variable is generated depending on the type of distribution

Third stage: estimating the parameters according to the mixture distributions using the estimation methods.

Fourth stage: the results compare the efficiency of MLE method with Bayesian method of estimation using by computing the mean of the sum of the modulus of the bias (MBias), and the root-mean square error (RMSE), where the smaller RMSE and MBias indicates a better overall quality of the estimates.

To find the MLE estimators, the Newton Raphson method was adopted. The parameters () are estimated with Metropolis method (MT) of estimation using the joint prior in (31) with hyperparameters (s = 1; m = 1; t = 1) where the simulation study was carried out 1000 times. Table 1 present the estimates (Est.) and the RMSE and MBias values by MLE and MT method. The smaller RMSE and MBias for each sample size is highlighted in bold . Looking at these tables we observe that: we obtained that Metropolis method is uniformly better than MLE in all cases.

Table 1: MBias and RMSE of the MLE estimates and the MT estimators for two component mixture Gamma distribution

Sample size

Method

RMSE

MBise

5.2594

2.3304

9.1590

3.8302

0.4065

2.3633

2.0702

McMc

2.5164

3.3312

4.1121

5.8485

0.5114

1.0555

0.8401

100

2.8220

6.3777

5.8096

10.8190

0.4999

2.0152

1.3130

McMc

2.6802

6.2251

5.5243

10.6079

0.5091

1.9129

1.3075

150

5.2594

2.3304

9.1590

3.8302

0.4065

2.3633

2.0702

McMc

2.5164

3.3312

4.1121

5.8485

0.5114

1.0555

0.8401

4- Discussion

The parameters are estimated with Metropolis method and the Expectation Maximization algorithm(EM) from the simulation results, it is observed that Bayes estimator better than maximum likelihood estimation in all cases

References

Saieed, H,A.(2005)” Estimation 0f parameters of mixture distributions and Its application on Neonatal birth weight data in Nineveh Govemorate”, Unpublished doctors thesis , college of computer Science and Mathematics , University of .
Al-masri, Hanin, s,. (2020).”Bayesian inference on the genralized gamma distrbution”. Journal of natural studies, Islamic university of gaza, V(28), pp 01-18.
Jos´e M Bernardo and Adrian FM Smith.. (1982)” Bayesian theory”. volume 405 of Wiley Series in Probability and Statistics. John Wiley & Sons, 2009.
Jewell NP. (1982).“Mixtures of exponential distributions”. Ann Stat.; 10(2): 479-484.
Chris, F., & Raftery, A,. (2017). “Model-based clustering, discriminant analysis, and density estimation”. Journal of the American statistical Association, 5(1).
Corduneanu, A. & Bishop, C.M. (2001), “Variational bayesian model selection for mixture distrbution “, Artifitial intellegence and statistics, T. Jaakkola and T. Richardson (Eds) pp 27-34, Morgan Kaufmann.
Dempster AP, Laird NM, Rubin DB. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39:1–38
Friedman, J., Hastie, T., and Tibshirani, R. (2009). “ The elements of statistical learning,”, Springer series in statistics , New york, Vol 2.
Filho, I. (2008). "Mixture Models for the Analysis of Gene Expression: Integration of Multiple Experiments and Cluster Validation. Berlin, Germany: Department of Mathematics and Computer Science, Free University of Berlin, GermanyPatwray, A. N.; Sriwastav, G. L.; Hazarika, J. "Inference of R= P (X< Y<Z) for n-Standby System: A Monte-Carlo Simulation Approach". Math, 2016, 12: 18-22.‏S
Ghojogh, B., Ghojogh, A., Crowley, M., and Karray, F. (2019). “Fitting a mixture distrbution to data” , Waterloo, canada.
Li L.A., Decomposition Theorems,.( 1983), “Conditional Probability, and Finite Mixtures Distributions”. PhD Thesis, State University, Albany, New York,.
Newcomb S. (1886).” A generalized theory of the combination of observations so as to obtain the best result).Am. J. Math. 8:343–66
SK Upadhyay, N Vasishta, and AFM Smith(2000). “Bayes inference in life testing and reliability via markov chain monte carlo simulation”. Sankhy¯a: The Indian Journal of Statistics, Series A, pages 203{222.
Sattaytham, P. and Talangtam, T.(2012). “Fitting of finite mixture distrbutions to motor insurance claims”. Journal of mathematics and statistics, 8(1), pp 49-56.
Valieris ,R. , Drummond,R.D., Defelicibns , A. , Dias-Neto , E., Rosales, R.A.& da Silva, I.T. “A mixture model for determining SARS-COV-2 varint composition in pooled samples”, Universidade de sao paulo, Ribeirao, sao paulo 14040-901, Barazil.

Statistics

Article View: 201

PDF Download: 134