Introduction
Extreme temperature events, such as heat waves, have significant impacts on various domains, such as agriculture, energy, and public health. Modeling these extreme events accurately is crucial for better understanding their behaviour and for effective planning and mitigation of the risks associated with them. Extreme value models have shown promising performance in modeling such types of data. In a spatial context, a max-stable spatial process is considered, and the models of this process will be in multivariate case [1]. This situation itself poses a challenge because these events follow multivariate extreme value (MGEV) distributions, and no existing models can capture the dependence structure of these events. Additionally, ignoring the dependencies among the locations and treating them as independent locations using the Generalized Extreme Value (GEV) distribution for each location will provide an unreal representation of the events. Despite the models of the max-stable in the bivariate case existing, the limited number of these models, and the relatively high number of parameters in most of the models also state as restrictions in the modeling.
For what is mentioned above, there is a need for a statistical tool that can combine the multivariate extreme-value theory with models more simple than classical ones, (Smith, Brown-Resnick, and Schlather), so that can be considered appropriate models for the dependence structure among the locations of the extreme event. Extreme value copula has gained a lot of attention in recent years for Modeling the dependence structure between extreme random variables. It is based on the extreme value theory, so these extreme copulas provide a functional link between multivariate distribution functions and their univariate margins [2, 3]. In spatial extremes, extreme value copulas play a crucial role. They enable the characterization of the dependence structure of the extreme event occurring at different locations. By considering the tail behaviour of these events, extreme value copulas can accurately capture the underlying dependence patterns. This, in turn, leads to improved modeling and analysis of spatial extremes. For examples, see [4], [5], [6], [7], and [8].
The report of the International Organization for Migration (IOM) in the UN concerning climate change published on 11 August 2022 puts Iraq as the fifth-most vulnerable country to climate breakdown, affected by soaring temperatures, and this requires preparing for assessing the risks associated with this climate change. Choosing a 2m air temperature to investigate its behaviour in this study was motivated by the outputs of this report. To address this breakdown, one should first understand the behaviour of the extreme 2m air temperature. This will be done by Modeling this event via the extreme-value copula. The 2m air temperature was collected from the fifth generation of the European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric, land and oceanic climate global dataset ERA5 [9]. This study is devoted to investigating the behaviour of the two-meter air temperature in Iraq through in-depth analysis by extreme-value copula with Pickands dependence functions. The modeling of this event has been done by following the statistical inference on extreme-value copulas introduced in [10] and adaptation of extreme-value copula to spatial context by considering the parameters are functions of distance among the locations. The Composite Maximum Pseudo-Likelihood estimation method introduced in [11] has been used in the modeling.
The paper is organized as follows: the theoretical concepts of extreme-value copula models, and corresponding Pickands dependence functions. Furthermore, adapting these concepts to the spatial context has been presented first. Then, the Composite pseudo-likelihood method is used in the parameters estimation of the copula models presented in Extreme-Value copula section. Preparing the 2m air temperature in Iraq dataset by pre-processing it (examining the stationary, isotropy, tail dependencies, and symmetry properties), modeling, and choosing the best-fitted model have been done in modeling the 2m air temperature in Iraq dataset. Finally, the discussions and conclusions of the main results obtained were presented.
Extreme-Value Copula
In this section, extreme-value copula models and their extension to spatial context used in modeling the 2m air temperature have been presented. The extreme-value copula will be defined via Pickands dependence function. Pickands function is a major and important key in extreme-value copula, so choosing different functions of Pickands leads to different copula models. Since the dataset that will be discussed in the pre-processing section is symmetric, a symmetric Pickands function, i.e., symmetric extreme-value, will be the focus of this section. The main fundamental concepts concerning Copula can be found in [12].
Let be a random vector with multivariate probability distribution function , and marginals . A function is said to be a multivariate copula , with dimension if and only if
where . The copula is unique if is continuous [13]. In extreme value context, let be a random vector with i.i.d replications and multivariate copula , and let non-overlapping block maxima, such that , and has max-stable copula
The extreme-value copula exists if and only if
such that
where is a max-stable tail dependence measure [10, 14].
Bivariate extreme-value copula via Pickands dependence function
Since the multivariate extreme events have tail dependence, a Pickands dependence function is the most reasonable measure able to quantify the dependence strength among the variables [15]. It has the capability on analyzing rare events, such as extreme weather events. More specifically, Pickands dependence function is considered an essential tool in bivariate extreme-value copula, because it can reduce the mapping to one dimension, and Copula is fully characterized in this dimension [13]. Without loss of the generality, we will define when . In extreme context, and under max-stability tail dependence assumption, we can define Pickands dependence function, so that for all ,
Therefore, in the bivariate case, the extreme-value copula in Equation (4) can be expressed by Pickands dependence function as follows
By Theorem 2.22 in [10] formula of in (6) will be
or equivalently
Referring that is the inverse of marginals corresponding to the . Since this study concerns to block maxima case, then will follow GEV distribution with location , scale , and shape parameters. is a convex function with inequality ,respectively correspond to complete dependence and independence. Deferent models of are determined by models of . We will present the most common parametric and symmetric copulas, that will be used in the modeling in this paper.
- Hüsler-Reiss copula: the spirit of this model is the standard Gaussian model with correlation function , which can introduce via Pickands dependence function, so that
-
- and is the univariate standard Gaussian distribution. Let as , then as . In this model, will be the parameter of such model, under the assumption that the correlation between the pairwise increase as the size of sample increase also [16].
- Gumbel copula: this model is one of the Archimedean copulas able to evaluate the dependence straight in asymptotic limits of maxima (upper tail) of the pairwise . The Pickands dependence function is
The attraction of this model is the domain of the Logistic distribution function [17].
- Galambos copula: this model is the negative of Gumbel copula (derived from negative logistic distribution), e.g., if is Gumbel model, and is the corresponding survival copula, then we can consider is a distribution function of the pairwise , so that the corresponding Pickands dependence function is
- t-EV copula: Its so-called t-Extreme Value. This model is derived from Student’s distribution of the pairwise of , with two parameters correlation coefficient , and degree of freedom . The Pickands dependence function of this model can write as
where
and is a univariate student’s distribution function with a degree of freedom . If , then t-EV weakly converges to Hüsler -Reiss copula with parameter [18].
Adaptation of extreme-value copula to spatial context
In the previous section, the extreme-value copula and corresponding Pickands dependence functions are presented in general concepts. In this part adaptation of these concepts to spatial extremes will be done. Let , , , and be an i.i.d replications of spatial process. Let and , be two continuous sequences, if there exist
with non-degenerate marginals, then is a max-stable process, such that , where refer to GEV distribution with location , scale , and shape parameters [19]. The max-stable spatial process is said to be strictly stationary, if , , , and . And is isotropic if the covariance for each depends only on the distance, such that , . In what follow in this paper, the max-stable spatial process will be under assumptions of stationarity and isotropy properties.
When the focus is on extreme values, it is necessary to use more suitable tools for analyzing the spatial dependence of extremes. Since our aim is modeling using the Extreme-Value Copula concept via Pickands dependence function, we present the concepts in a spatial context. Let be a pairwise of spatial process with unit uniform distribution separated by the distance , such that for all , . The bivariate extreme-value copula corresponding to the pairwise is
The Pickands dependence function is a function that evaluates the dependence strength between separated by distance . Concerning Husler-Reiss Pickands function, the spatial aspect will be included, so that, the parameter in Equation (10), will be , where is the spatial isotropic correlation function. Many models of correlation function exist, such as exponential, power exponential, and many others. As well as for in t-EV copula model [20]. Concerning Gumbel and Galambos copula models, respectively with parameters and , the same consideration will be made. The fact that, the dependence strength of each pairwise in are varying, and since is isotropic, which means this varying will be according to the distance. And most of the time this dependence strength decreases as increases. Therefore, using this fact, we will consider the parameters and to be the trend across distance. Such that, , as well as for , where is a coefficient of trending.
Composite Maximum Pseudo-Likelihood
A parametric estimation such as the Maximum Pseudo-Likelihood MPL method showed as a useful tool for estimating copula parameters, especially when the marginals are unknown [21]. Since just the bivariate extreme copula models exist, the composite likelihood is a reasonable method for estimating spatial extreme models [22, 23]. The combination of the two likelihood methods composites and pseudo was defined in [11], named Composite Pseudo-Likelihood CPL. This method is very suitable when using the copula concept in modeling spatial extremes. For that, this method was used in the Modeling of the extreme 2m air temperature event.
Given a max-stable dataset with i.i.d replicates, and let is pseudo max-stable spatial process. The Composite pseudo-likelihood function is given by
where denoted to the likelihood contribution function of the pairwise at the replication . In this study used as bivariate density of the corresponding extreme-value copula defined in Equation (16). Let the compact set of the parameters of is denoted by . The estimation of can be achieved by maximizing , so that
Since the i.i.d achieved on copula when the marginals are known, such as in this case, the pairwise pseudo-likelihood estimator has asymptotic normality as , with mean and covariance matrix of sandwich form , where
respectively are the variance of the score function, and the expected information matrix are computed from Equation (17). For more details about asymptotic behaviour, see [11], [24], and [25]. The estimation of can be readily obtained from the Hessian provided by the optimization algorithm, and of by the empirical variance of the score contribution of each observation [26]. The selected model will be according to the corresponding minimum of , where is the number of locations in the dataset, and
is the Composite Likelihood Information Criterion. is very closely to Akaike Information Criterion AIC, so for simplicity in computations, we shall use AIC instead of . See [23] and [26].
Modeling the 2m air temperature in Iraq dataset
Data description and pre-processing
The goal of this section is to model the extreme 2m air temperature in Iraq. The hourly 2m air temperature was collected from the fifth generation of atmospheric land and oceanic climate global dataset ERA5, produced by the European Centre for Medium-Range Weather Forecasts ECMWF. This dataset was collected for the region with a longitude range of 37.5 to 49 degrees, a latitude range of 27.5 to 38 degrees, and a grid spacing of 11 km during the summer season (June, July, and August) for the years from 1981 to 2022, at times from 11:00H to 17:00H. This collection of data resulted grids and hourly observations for each grid. Mathematically, let , , , and be a spatial process represent the 2m air temperature. To ensure the block maxima be i.i.d, a monthly block maxima was proposed. So that for a non-overlapping replication
where is a spatial extreme process, for each marginal of follows GEV distribution, and the amount represents respectively the number of hours per day times the number of days per month.
To examine the stationarity of the dataset, the GEV’s parameters , and are estimated for each location of using the maximum likelihood estimation method. The grids in the three panels in Figure 1 represent the estimated location , scale , shape parameters for each . Noting that, all the computational process in this study has been done by R program with main package ‘copula’ version 1.1-2, and others.
Figure 1: The panels respectively represent the estimated parameters , scale , shape for each grid of the dataset
It is clear that each of the three estimated parameters for each grid is approximately equal, especially in the red region for , green in ; and red for excluding the northeast of Iraq, due to the mountains. That means and from the definition of strictly spatial stationarity, we can consider the dataset has spatial stationary property. In the following step the spatial extreme dataset will transform into , such that for each
where is the GEV distribution. In what follows, we shall deal with instead of . It is known the dependence structure pattern of the events is essential in modeling extreme events. This structure distinguishes between the models corresponding to asymptotic dependence/ independence structures. So, in the next step, examine the dataset for which the dependence structure belongs to asymptotic dependence or independence. This examination will by empirical upper and lower tails dependence measures [27],
And
where is the empirical Copula, so that , and . For more details, see [10]. The threshold was chosen to be to ensure there are data for computation. The pairwise evaluation of and of the dataset represented respectively by the first and second panels in Figure 2. From these panels, the dataset seems to have an upper tail dependence structure, due to , as well as for . So, we can consider the block maxima has an asymptotic dependence structure, and this leads to consider extreme copula models. To verify if the dependence structure of the dataset is present in extreme, furthermore, the exchangeability (have symmetric distribution) between the pairwise locations , a test hypothesis corresponding to these two assumptions has been made, (see Figure 3). Once again, the northeastern region of Iraq does not appear a tail dependence structure, and this is clear on the top and right sides of the two panels. From this result, this region will be excluded definitely from the modeling.
Figure 2: Empirical pairwise upper and lower tail dependence measures, respectively represented by the left and right panels. Each grid in the panels represents the corresponding tail dependence strength between the pairwise , .
Regarding the pairwise extreme-value dependency, locations sampled form for this purpose. The -value test with will be done between the empirical pairwise copula and extreme-value copula with the non-parametric estimate of Pickands dependence function under the hypothesis
where , is a Pickands dependence function of [28]. The -value statistics is illustrated in the lift penal in Figure 3. The blue fill represents that the p-value test cannot reject . In other words, the pairwise , has extreme-value copula. As illustrated in the left panel, the test in most of the pairwise failed to reject , so we can consider extreme-value models in modeling the dataset. To ensure this property exists in modeling, the pairwise locations rejected will be excluded from the selection of locations for modeling.
To test the exchangeability between the pairwise (symmetry radial of the underlying multivariate copula), -value statistics will be used based on empirical copula [29]. The assumption of the symmetry copula will be under the hypothesis
where is the survival of . The right panel in Figure 3 appears that the majority cannot reject . Then we can consider the extreme-value and symmetric models.
Figure 3: The panels represent the -value test with for extreme value dependence and the exchangeability (symmetry) of the copula dataset. The left panel, represent the pairwise , has extreme-value dependence after collecting randomly 100 locations among location; while the right panel represents the which pairwise have symmetric copula model.
Copula models proposed for modeling 2m air temperature dataset
By adopting the results obtained in the pre-processing section, the locations implemented in the modeling will be randomly selected from locations to ensure consistency with the outcomes of the previous section: The northeast region of Iraq will be excluded from the Modeling, due to the spatial non-stationary with the remaining region. In other words, the GEV marginals of this region have different behaviour from others; the process at location which does not have extreme-value dependences; and also, does not have symmetric copula will be excluded from the modeling also. The 2m air temperature spatial process in locations have been randomly selected, 40 for modeling and 10 for validation. The latter locations will not be used in modeling. Most of these locations are in Iraq and small numbers are sited around the border of the west and south of Iraq, which have the same behavior as the majority of locations. The coordinates of the selected locations are pointed out in Figure 4.
Figure 4: The coordinates of the selected locations of modeling and validation. The red dots represent the locations that will use in the modeling, while the green one will be for validation
According to the results of the preprocessing, a symmetric extreme-value Copula model is proposed with different Pickands dependence functions. One parameter extreme copula family (Hüsler-Reiss, Gumbel, and Galambos), and two parameters family, such as t-EV defined respectively in Equations (10), (12), (13) and (14) are chosen for modeling. Traditionally, in one-parameter models, the tail dependence strength between two extreme random variables is controlled by this parameter. To extend this concept to spatial context, should this parameter be varying across the distance between the pairwise locations , so that these parameters measure the tail dependence strength between or equivalently separated by the distance , under isotropy assumption. Due to the dependence strength varying with each pairwise separated by a distance , the parameters will consider as a function of distance. So, models A will consider the parameter varying across the distance with coefficient ; model B, the parameter varying across according to exponential dependence strength; models C will be according to power exponential; and model D with Cauchy. So that the proposed models will be
- Hüsler-Reiss A with parameter, .
- Hüsler-Reiss B with parameter, , where , and, .
- Hüsler-Reiss C with parameter, , where , and .
- Hüsler-Reiss D with parameter, , where, , and .
- Gumbel with parameter, , .
- Galambos with parameter, , .
- t-EV A with , and, .
- t-EV B with , and .
- t-EV C with , and .
Due to no extreme-value models with exist in spatial context, all models considered with , and therefore the estimation of the parameter’s models will be by Composite Pseudo Likelihood CPL method previously introduced. Usually, Composite likelihood method is used for estimating parameters for the spatial extreme process. Table 1 shows the estimated parameters, log-likelihood amount, and model selecting criterion (Akaike information criterion AIC) corresponding to each model proposed.
Table 1: Show the estimated parameters, log-likelihood, and Akaike information criterion AIC for the models proposed
|
Copula model
|
Estimated parameters
|
log-likelihood
|
AIC
|
|
|
|
|
|
|
|
|
|
Hüsler-Reiss A
|
1.406670
|
|
|
|
15.89644
|
-3.53219
|
|
|
Hüsler-Reiss B
|
|
0.839202
|
|
|
15.79983
|
-3.519998
|
|
|
Hüsler-Reiss C
|
|
71.5492
|
0.120000
|
|
16.00407
|
-1.545686
|
|
|
Hüsler-Reiss D
|
|
0.01508
|
0.105000
|
|
15.96269
|
-1.540508
|
|
|
Gumbel
|
1.888350
|
|
|
|
15.78417
|
-3.518015
|
|
|
Galambos
|
2.350493
|
|
|
|
15.74095
|
-3.512531
|
|
|
t-EV A
|
|
0.38663
|
|
2.838140
|
15.68348
|
-1.505216
|
|
|
t-EV B
|
|
0.15070
|
0.336934
|
3.219790
|
15.83652
|
0.4753627
|
|
|
t-EV C
|
|
0.10200
|
1.800157
|
1.083037
|
15.47631
|
0.5213791
|
|
The AIC information criterion in Table 1 indicates that the best four fitted models are Hüsler-Reiss A, B, Gumbel, and Galambos. The slight difference in the criterion values of AIC among the four candidate models made us consider the model which will represent the 2m air temperature has minimum divergence between the non-parametric and parametric of the estimated bivariate extreme-value copulas. This divergence test will apply to the validation dataset using the Kuiback-Lebler KL method. For more information see, [10]. Each pairwise of the validation dataset, the KL divergence between the four estimated models and the model with non-parametric Pickands function have been done. Figure 5 shows that the Hüsler-Reiss A model has a divergence density of the pairwise of the validation dataset with more skew to lift (to zero) than the three candidate models. In other words, the divergence between the non-parametric and estimated Hüsler-Reiss A copula is the minimum. For that, now we can consider the Hüsler-Reiss A copula as the best extreme-value copula model to represent the 2m air temperature event in Iraq, and they can adapt it in further studies.
Figure 5: The densities of the Kuiback-Lebler divergence of the four candidate models (Hüsler-Reiss A, Hüsler-Reiss B, Gumbel, and Galambos) with the non-parametric extreme-value copula, which implemented on the validation dataset