Use Maximum Likelihood Method to Estimate the Non-normal Complete Randomized Design .

Jasim, Omar Ramzi; Abdulkhaleq Salih, Sarmad

doi:10.33899/iqjoss.2024.185242

Journals List

Use Maximum Likelihood Method to Estimate the Non-normal Complete Randomized Design .

IRAQI JOURNAL OF STATISTICAL SCIENCES

Volume 21, Issue 2, November 2024, Pages 70-79 PDF (403.04 K)

DOI: 10.33899/iqjoss.2024.185242

Authors

Omar Ramzi Jasim^* ¹; Sarmad Abdulkhaleq Salih²

¹Doctor of Statistics/ Department of Accounting / College of Administration and Economics /University of Al-Hamdaniya /Mosul/ Iraq.

²Department of Mathematics, College of Education for Pure Sciences, University of Hamdaniya, Mosul, Iraq

Abstract

In this paper, a complete randomized design (CRD) was used in case the number of replicates of the experiment was equal and only one observation was recorded and on the assumption that the experimental error term follows a non-normal distribution, and the importance of distributions with heavy tails is because they are a generalization for all Non-normal distributions: It was assumed that the error term follows the extension hyperbola distribution (ehd) and Laplace distribution(Ld), and based on the traditional method represented by the maximum likelihood method, the design parameters were estimated when the mathematical model was fixed once and random again. We concluded that the estimates of the model parameters when the experimental error follows a Laplace distribution (Ld) are similar to the estimates of the model parameters when the error is normal. Given the difficulty of obtaining an agricultural experiment that follows the (ehd) and (Ld), an experimental experiment was used through the MATLAB program, through the mean square error criterion, a comparison was made between the fixed and random mathematical model for a completely random design under different values of additional and torsion parameters. Through the experimental results, it was shown that the values of the mean square error criterion for the fixed and random mathematical model decreased as the additional parameters values decrease and for (ehd).

Highlights

The research reached the most important theoretical and experimental conclusions, which can be summarized as follows:

The estimate of the general arithmetic mean parameter in the case of a fixed mathematical model is equal to the estimate of the general arithmetic mean parameter in the case of a random mathematical model.
The estimates of the model parameters when the experimental error follows a Laplace distribution (Ld) are similar to the estimates of the model parameters when the experimental error is normal.
The values of the mean square error criterion for the parameters of the estimated completely randomized design decrease with increasing values of the additional parameters and for the fixed and random mathematical model.
The values of the mean square error criterion for the completely randomized design model decrease with decreasing values of the additional parameters and for the fixed and random mathematical model.
Through the comparison criterion, we notice the superiority of the random mathematical model over the fixed mathematical model for all cases of additional parameters and for ehd.

Recommendations:

Through the conclusions section, the researchers recommend conducting an agricultural experiment to analyze the non-normal completely randomized design, in addition to conducting other traditional methods and artificial intelligence algorithms to analyze the fixed and random mathematical model and compare them.

Keywords

complete randomized design; extension hyperbola distribution; Laplace distribution; maximum likelihood method; fixed mathematical model; random mathematical model

Full Text

Introduction

The design and analysis of scientific experiments is one of the branches of modern statistics, which is concerned with conducting experiments in various industrial, agricultural, and medical fields, as well as many other fields. The design varies from one experiment to another according to the requirements and parameters of that experiment, as the completely randomized design (CRD) is considered one of the simplest. It is the most famous design used in analyzing scientific experiments, especially in agricultural experiments, as it is used when the experimental units of the experiment are completely homogeneous. The experimental error term for an experiment in various designs is usually assumed to follow a normal distribution, but in some cases the assumption of a normal distribution is inaccurate. Therefore, In this research, distributions were used that are more efficient than the normal distribution in representing these scientific experiments. Among these distributions is the (ehd , Ld), which is a mixed distribution resulting from the mixed normal mean-variance distribution (mnmvd) with the extension inverse normal distribution (eind) and exponential distribution (ed) respectively, which is one of the continuous probability distributions. This distribution is considered the most general from normal distribution

As in previous studies on this topic, the researchers (Youssef and Raad, 2015) studied the distribution of the response variable, which follows the exponential distribution after the observations were obtained from the experiment according to the complete random design (CRD) within the factorial experiment and through it it was reached The analysis of variance method used in the analysis process was the best way to analyze the non-normal distribution of response variable observations. On this basis, (Vilca et al., 2014) derived the (ehd) by mixing the (eind) with the skewed normal distribution, as it was found that the (eind) has many interesting applications because it includes some distributions well-known inverse normal, gamma and exponential distributions are special cases and have been used as mixing density to build some heavy-tailed distributions including the student-t distribution and the Laplace distribution.

The research was divided into several sections. The first section dealt with a general introduction to the research as well as some previous studies of the topic under study, while the second section dealt with a general description of the completely randomized design. The third section included estimating the parameters of the assumed design in two cases, assuming that the mathematical model Fixed times and random again, using the maximum likelihood method, while the fourth section included the experimental side of the experiment, assuming a random experiment, and using a simulation of the ready-made program (MATLAB v.2023). Finally, the fifth and sixth sections dealt with the most important conclusions and future recommendations that the researcher reached through the research.

Complete Randomized Design (CRD):

The mathematical model for a completely randomized design can be defined by the following formula: (1,5, 8)

whereas:

represents the observation that took treatment (i) at replicate (j).

t: represents the number of treatments in the experiment.

r: represents the replicates

t*r: represents the number of experimental units for the experiment.

: represents the general arithmetic mean of the experiment.

: represents the effect of treatment (i).

represents the experimental error resulting from the experiment, and assuming that the experimental error of the experiment is non-normally distributed, but rather is distributed as (ehd) and (Ld) , where the probability density function can be found using the concept of mixed distributions from the (mnmvd) and (eind), which can be represented by the following equation:

The probability density function takes the following formula:

Since:

: represents the skewness parameter of the distribution.

The probability density function for the random variable ( ) can be written as follows: (6)

Since: (10)

: represents the hankel function of second-order (n), and represents the measurement parameters of the assumed distribution. (6,7)

Using the concept of mixed distributions, we can find the probability distribution of the experimental error of the experiment that is not conditional on the random variable (X) and follows:

Equation (4) above can represent the probability density function for the (ehd), which is described as follows: (4,10,11)

If , the (ehd) is transformed into the (Ld) with parameter (0,1). (10,11)

Since:

If (n) is natural number, then: (6)

Noting that the observation value ( ) in equation (1) is a linear combination in terms of the experimental error the experiment, which is distributed by the (ehd) and (Ld) , therefore, the probability distribution for ( ) can be found in the same way and the following form:

In the same procedure as before for then:

Equation (6) above can be described as follows:

Note:

Maximum likelihood estimators for parameters of a completely randomized design:

The maximum likelihood method is considered one of the most important traditional parametric methods for estimating the parameters of probability distributions. If we have treatment (t) and (r) from repetitions, then the maximum likelihood function for the observation ( ) is unconditional on the variable and using the concept of mixed distributions is written as follows: (3)

Given the difficulty of finding the maximum likelihood estimator from Equation (8), we will rely on the concept of mixed distributions to find it as follows:

Taking the natural logarithm of both sides of the above equation, we get the following equation:

After finding the maximum likelihood function, the maximum likelihood estimators were found for the completely randomized design in the case of the fixed mathematical model once and random on a second time, as in the following division:

First: The Fixed Mathematical Model:

The concept of a fixed mathematical model is based on several basic assumptions, the most important of which is that the effect of the treatments is equal to zero , meaning that the effect of the treatments is constant over the entire period, meaning the use of the same treatments is constant. From one experience to another. (2)

Estimating the general arithmetic mean parameter when the parameters ( ) are unknown and the skew and additional are known.

Based on the concept of the maximum likelihood method, we derive equation (10) relative to the general arithmetic mean of the experiment ,and equating it to zero, we obtain the estimate of the maximum likelihood conditional on the variable . Using the concept of mixed distributions, we obtain the unconditional maximum likelihood estimator in the following form:

Since:

If , then:

To verify that the estimator defined in equation (11) is the maximum likelihood estimator for the parameter , we take the second partial derivative of equation (8) and obtain the following formula:

Since the:

Whereas:

Therefore, the estimator defined in equation (11) is the maximum likelihood estimator for the arithmetic mean parameter of the experiment.

Estimate the parameter when the parameters are unknown and the skew and additional are known.

By deriving equation (10) relative to ( ) and setting it equal to zero, we obtain the maximum likelihood estimator conditional on the variable , and through the concept of mixed distributions we obtain the non-conditional and agency maximum likelihood estimator:

Since:

If then:

Estimate the parameter when the parameters are unknown and the skew and additional are known.

whereas:

If ,then:

Second: The Random Mathematical Model:

This model is based on the effect of treatments following a specific probability distribution, and in this research, it was assumed that the effect of treatments follows a generalized hyperbolic distribution, with the same concept as Section 2. (2)

We take the natural logarithm of both sides of equation (19):

By performing the same steps that were conducted in the third section when estimating parameters in the case of a fixed mathematical model, we find:

Estimate the general arithmetic mean parameter μ when the parameters are unknown and the skew and additional are known.

If , then:

We note from equation (21) and (22) that the estimator in the case of the random mathematical model is the same as the estimator in the case of the fixed mathematical model in equation (11) and (12).

Estimate the parameter when the parameters are unknown and the skew and additional are known.

Since (Q, QQ) were previously defined in equation (17).

If , then:

Estimate the parameter when the parameters are unknown and the skew and additional are known.

By solving the two equations simultaneously, we get:

If , then:

By solving the two equations simultaneously, we get:

Depending on the estimator of the general arithmetic mean and the variance of the fixed mathematical model, the estimator of the variance of the effect of treatments for the random mathematical model will be found in the field of the experimental experiment.

Experimental Experience:

In this study, a random experimental experiment was generated that follows a (ehd) based on the concept of mixed distributions and using ready-made software (Matlab R2023a) (9). After generating the experiment data, It was tested whether the data follows the assumed distribution, represented by the (ehd), using the nonparametric test (Kolmogorov Smirnov test), if it is found that (p-value) is greater than the level of significance at different values of additional and skew parameters, then the hypothesis is accepted which states that the data of the experimental experiment follows the (ehd). By implementing the program, the estimator of the mathematical model for the CRD with different initial values is calculated and compared using the (MSE) criterion.

The following two tables show the assumed values for generating the experimental trial, calculating the estimators, and distributing the treatments to the experimental units.

Table 1: Values of additional, torsion and design parameters for generating the experiment.

A,B,C,D

Rand

Table 2: Distribution of treatments among the experimental units

5 D	4 C	3 A	2 A	1 D
10 B	9 A	8 C	7 D	6 B
15 C	14 C	13 B	12 D	11 A
20 C	19 D	18 A	17 B	16 B

Table (3) shows the measurements for each generated treatment based on the initial assumed values in Table (1) and using the MATLAB language:

Table 3: Measurements for each treatment.


treatment D	treatment C	treatment B	treatment A	Repetition	treatment D	treatment C	treatment B	treatment A	Repetition
1.01	4.25	9.12	8.11	1	5.63	9.24	3.45	4.61	1
5.14	5.36	7.16	6.12	2	3.65	5.32	2.14	2.13	2
3.54	8.88	9.15	7.16	3	4.87	7.14	3.99	9.23	3
1.25	4.36	3.56	7.77	4	5.67	1.38	6.32	4.33	4
3.35	4.11	3.96	2.39	5	8.91	1.66	8.20	9.14	5
14.29	26.96	32.95	31.55	SUM	28.73	24.74	24.1	29.44	SUM
2.86	5.39	6.59	6.31	MEAN	5.89	4.82	4.95	5.75	MEAN
105.75					107.01
5.2875					5.3525


treatment D	treatment C	treatment B	treatment A	Repetition	treatment D	treatment C	treatment B	treatment A	Repetition
1.01	4.25	9.12	8.11	1	5.61	9.24	3.45	4.61	1
5.14	5.36	7.16	6.12	2	3.65	5.32	2.14	2.13	2
3.54	8.88	9.15	7.16	3	4.87	7.14	3.99	9.23	3
1.25	4.36	3.56	7.77	4	5.67	1.38	6.32	4.33	4
3.35	4.11	3.96	2.39	5	8.91	1.66	8.20	9.14	5
14.29	26.96	32.95	31.55	SUM	28.71	24.74	24.1	29.44	SUM
2.86	5.39	6.59	6.31	MEAN	5.74	4.95	4.82	5.89	MEAN
105.95					106.99
5.2875					5.5300

From the table above, the measurements were found for each treatment under different additional and torsion parameters, in addition to finding the sum and average for each treatment. Tables (4) and (5) show the mean square errors of the estimates of the parameters of the completely randomized design in the case of the fixed and random mathematical model, as follows:

Table 4: Mean square error for estimators of a completely randomized design in the case of a fixed mathematical model.


0.6698	0.9962
1.2254	1.2542
2.9871	3.4520
2.1245	2.3214
1.1002	1.1232
3.6541	4.2350

0.6012	0.7562
0.9032	0.9952
2.3540	2.5460
1.9862	2.0398
0.8923	0.9771
2.9952	3.9521

Table 5: Mean square error of the estimators of a completely randomized design in the case of a random mathematical model.


0.5333	0.5421
1.0025	1.1152

0.5125	0.5400
0.9991	1.1008

We notice from Tables (4) and (5) that as the values of the additional parameters increase, the average square error values for all design parameters and the fixed and random models decrease, in addition to the superiority of the estimate of the parameters of the completely random design in the case of the random mathematical model over the design parameters in the case of the fixed mathematical model. Table (6) shows the average square error for the fixed and random mathematical models for the completely randomized design.

Table 6: Mean square error for a completely randomized design model.


Random	Fixed	Random	Fixed	Mathematical model
7.9923	8.2147	7.0029	7.5500

Random	Fixed	Random	Fixed	Mathematical model
8.9541	9.0450	8.4522	8.8201

From Table (6), we notice that as the values of the additional parameters decrease, the value of the mean squared error criterion for both models decreases

Table7: shows the measurements for each generated treatment based on the (Ld)
and mean square error for estimators and for a completely randomized design model.

random mathematical model	Fixed mathematical model	treatment D	treatment C	treatment B	treatment A	Repetition
1.0025	1.1125	7.95	6.14	5.87	4.52	1
3.0145	2.0354	5.85	2.22	6.33	3.65	2
	4.1036	3.54	3.45	3.47	4.36	3
	4.9584	4.25	4.26	4.25	1.23	4
	1.3541	9.91	6.25	6.35	8.32	5
	5.3245	31.5	22.32	26.27	22.08	SUM
Mse Random model	Mse Fixed model	6.3	4.464	5.254	4.416	MEAN
10.3254	12.3570	102.17
		5.1085

From the table (7) we notice the superiority of the random mathematical model over the fixed mathematical model through mse criteria.

From the table (6) and table (7) we notice the superiority of the fixed and random mathematical model when the experimental error follows a ehd.

References

References:

Al-Alawi, Hassan Hadi and Al-Hadithi, Falih Hassan and Al-Salmani, Hamid Khalaf (2005) "The effect of irrigation water source and nitrogen on some soil chemical properties", Iraqi Journal of Agricultural Sciences, (4)36.
Al-Rawi, Khashia Mahmoud and Khalaf Allah, Abdulaziz Khalid Muhammad (1980) "Design and Analysis of Agricultural Experiments", second edition, Dar Al-Kutub Directorate for Printing and Publishing, University of Mosul, Iraq.
Al-Safawi, Safaa Younis and Al-Jamal, Zakaria Yahya (2006) "Using the Maximum Likelihood Method and the Kaplan-Meier Method to Estimate the Reliability Function with Application to the Babylon Tire Factory", Tanmiya Al-Rafidain Journal, 82 (28), pp. 9-20.
Fajardo, J. and Farias, A. (2004) " Generalized hyperbolic distribution and Brazilian data" , Barazilian Review of Econometrics, vol. 24, no.2, p.p. 249-271.
Klaus Hinkelman and, Oscar Kempthome (2008) " Design and Analysis of Experiments", Volume 1, Introduction to experimental design second edition, John Wiley and Sons, New York.
Koudou, A. E. and Ley, C. (2014) " Characterizations of GIG laws: a survey complemented with two new results", Proba. Surv. ,vol. 11 , p.p. 161-176.
Markel, E. G. (2015) " Bessel functions and equations of mathematical physics ", Supervisor, Judith Rivas Ulloa, Leioa.
Maxwell, S.E., Delaney, H.D. and Kelley, K. (2017) "Designing experiments and analyzing data: A model comparison perspective", Routledge.
Salih, S. A. and Aboudi, E. H. (2021)" Bayesian Inference of a Non-normal Multivariate Partial Linear Regression Model" Iraqi Journal of Statistical Science(34), pp. 91-115.
Thabane, L. and Haq, M. S. (2004)" On the matrix-variate generalized hyperbolic distribution and its Bayesian applications " Journal of Theoretical and Applied Statistical Science, vol. 38(6) , p.p.511-526.
Vilca, F., Balakrishnan, N. and Zeller, C. B. (2014) " Multivariate skew- normal generalized hyperbolic distribution and its properties" Journal of Multivariate Analysis, vol.128,p.p. 73-85.
Youssef, Rauda Raad (2015) "Analysis of factorial experiments for an exponential distribution of the response variable with application", Master’s thesis in statistics, College of Administration and Economics, University of Baghdad, Iraq.

Statistics

Article View: 158

PDF Download: 59