Estimating Loss Given Default Based on Beta Regression

2021-12-16 06:41JamilJaberNoriszuraIsmailSitiNorafidahMohdRamliBakerAlbadareenandNawafHamadneh
Computers Materials&Continua 2021年3期

Jamil J.Jaber,Noriszura Ismail,Siti Norafidah Mohd Ramli,Baker Albadareen and Nawaf N.Hamadneh

1Department of Risk Management and Insurance,The University of Jordan,Aqaba,77111,Jordan

2Universiti Kebangsaan Malaysia,Department of Mathematical Sciences,Malaysia

3Department of Basic Sciences,College of Science and Theoretical Studies,Saudi Electronic University,Riyadh,11673,Saudi Arabia

Abstract: Loss given default (LGD) is a key parameter in credit risk management to calculate the required regulatory minimum capital.The internal ratings-based(IRB)approach under the Basel II allows institutions to determine the loss given default (LGD) on their own.In this study, we have estimated LGD for a credit portfolio data by using beta regression with precision parameter(∅)and mean parameter(μ).The credit portfolio data was obtained from a banking institution in Jordan;for the period of January 2010 until December 2014.In the first stage,we have used the“outstandingamount”and“amount of borrowing”to find LGD of each default borrower(494 out of 4393 borrower).In the second stage,we fit univariate parametric distributions to the LGD data to obtain the beta distribution.After that, we have estimated the values of ∅based on microeconomic variables (SPP, OE and LR).Moreover, we have estimated the values of μ based on macroeconomic variables (GDP and Inflation rate).Finally, we have compared between six different link functions(Logit,loglog,probit,cloglog,cauchit,and log),which have used with ∅and μ.The results show that Beta regression with probit link function has the highest R-squared with accepted measurements for logL,AIC and BIC.

Keywords: Credit risk; loss given default; parametric distribution;regression model

1 Introduction

The Basel Committee gives three approaches to estimate the required regulatory capital in banking institutions; standardized, internal ratings-based (IRB) and progressing IRB.The IRB approach is generally favored compared to the standard approach because it produces higher accuracy estimates and lower capital charges.The IRB approach under the Basel II allows institutions to determine probabilities of default (PD) and loss given default (LGD) on their own, as opposed to the standardized approach under the Basel I which uses estimates of PD and LGD from the Central Bank to calculate the required capital based on a percentage of risk-weighted-assets.Thus,the Basel II leads to a better differentiation of risks and considers diversification of a bank’s portfolio [1,2].

In Jordan, the banking sector obtains the estimate of LGD from the Central Bank of Jordan under the Basel I.Therefore, the LGD estimate is fixed, and does not varies according to banks.The contributions of this chapter are to estimate the LGD under the Basel II for corporate credit portfolio in the Jordanian banking sector, to fit the LGD data to parametric distributions,and to incorporate internal and external financial variables into the data so that the LGD data can be fitted to regression models.The main advantage of fitting the LGD data with financial variables as covariates is that we can determine which financial variables significantly affect the LGD.Since the LGD is influenced by some key transaction characteristics, several categories of variables such as macroeconomic and industry-specific variables can be used to build a predictive regression model.

The objective of this chapter is to use several parametric models (beta, Cauchy, gamma,Gompertz, logistic, log-normal, gamma, normal, and Weibull) to estimate LGD.The models are fitted to a sample data obtained from the corporate credit portfolio of a bank in Jordan for the period of January 2010 until December 2014.The LGD data lies between interval [0,1] because it is the proportion of outstanding amount from the borrowing amount.We also consider five financial variables for the regression model to determine the financial variables, which significantly affect the LGD.The financial variables are gross domestic product (GDP), inflation rate (Inf),service pricing policy (SPP), operating efficiency (OE), and liquidity ratio (LR).

The rest of this paper proceeds as follows.In the next section, we discuss the literature review for important work for LGD, and Section 3 describes the parametric distributions for fitting the LGD data and the regression models for determining the financial variables which significantly affect the LGD.We present the sample data and the empirical results in Section 4, and the final section concludes.

2 Literature Review

Most empirical studies on credit risk depended heavily on corporate bond markets to gauge losses in the case of default.The reason behind this is that bank loans are private instruments,and thus, little information on loan losses are freely accessible.Several studies on credit risks and LGD on bond markets have been carried out in the last several decades.An earlier study can be found in [3] who utilized actuarial analysis to investigate mortality rates of U.S.corporate bonds.This was followed by various empirical studies on credit risk in bond markets (see, for example [4,5]).The mortality approach was also used by [6] to measure the percentage of bad and doubtful loans of corporate bond recovered several months after the default date.Recent studies on LGD can be found in [7] who proposed a new model for LGD of bank loans by leveraging time to recovery, and reference [8] who forecasted LGD of bank loans using multi-stage model.In another study, [9] constructed a survival model to predict risks of cardholders and applied the model to a case study in Capital Card Services.

LGD data from banking industry tends to be skewed and heavy-tailed, and thus,can be fitted by parametric models such as beta, lognormal, gamma, and Pareto.Besides parametric models, non-parametric models were also proposed and used, such as regression tree, neural networks, multivariate adaptive regression spline, and least squares support vector machine [10-17].

For the case of LGD with covariates, regression models can be utilized and several examples can be found from past studies.Examples of regression models for LGD data are ordinary least squares regression (OLS), ridge regression (RiR), and fractional response regression.In 2005 Moody’s introduced the renowned LossCals Model using a multivariate linear regression model consisting of industry and macroeconomic factors, and reference [18] applied logistic regression with time consideration using transformed LGD as dependent variable and macroeconomic variables as independent variables.Recently, reference [19] used quintiles regression to estimate downturn and unexpected credit losses known as downturn LGD.Finally, reference [9] constructed a model to predict the risk of a cardholder for the lifetime of account and applied survival analysis methodologies to a case study in capital card services.

Under the Basel II, banking institutions are suggested to consider macroeconomic downturn conditions when estimating recovery rates [20-22].In particular, reference [22] assumes that banking institutions should use gross domestic product (GDP) growth rate and unemployment rate as determinants of recovery rate prediction.It should be noted that recovery rate is equal to one minus LGD rate.Studies from [23,24] showed that GDP growth rate was significantly relevant to the recovery rate of the U.S.bonds.On the other hand, references [12,25] found that GDP growth rate was not significantly relevant.

Other macroeconomic covariates were also suggested in the literature to predict recovery rate,such as inflation rate [24], interest rate [14,24], growth rate of investment [12,26] and rate of return on stock market [24,25].Studies from [6,27] showed that recovery rate decreases when loan size increases.In another study, reference [8] used Japanese credit portfolio to analyze impacting factors of LGD and to improve multi-stage model for predicting LGD.The variables considered are creditworthiness score, collateral quota (commercial bills), collateral quota (real estate), collateral quota (marketable securities), collateral quota (deposits), credit guarantee quota,and exposure (in hundred million yen).Their results showed that collateral, guarantees, and loan size significantly affect the LGD.

In this study, we consider macro- and micro-economics factors as explanatory variables, and the factors are obtained from available reports.We use macro-economic factors (GDP and inflation rate) for the mean parameter and micro-economic factors (service pricing policy, operating efficiency, and liquidity ratio) for the dispersion parameter in beta regression model.

3 Methodology for Estimating LGD

This section gives a background of the main concepts used in our study.Our model consists of three stages.In the first stage, we use the outstanding amount and amount of borrowing to find LGD of each default borrower.In the second stage, we will use the LGD to find the suitable parametric model.In the third stage, we will use beta regression with different link functions for fitting the LGD data with covariates for two parameters for beta regression.

3.1 Loss-Given Defaults(LGD)

A variety of models in which LGD is subject to systematic risk can be found in the literature.Reference [28] proposed a model in which the LGD is normally distributed and influenced by the same systematic factor that drives the probability of default (PD).Reference [10] employs a lognormal distribution for the LGD.Other extensions include [29], choosing a probit transformation.References [30,31] employ a logit transformation.However, reference [6] used mortality approach to measure the percentage of bad and doubtful loan of corporate bonds that are recovered n months after the default date.The actuarial-based mortality approach is appropriate because the population sample is changing over time.The dataset of this study is obtained from micro-data on defaulted bank loans of a private bank in Portugal, Banco Commercial Portugues (BCP).It consists of 10000 short-term loans granted to small and medium-sized companies from June 1995-December 2000 (66 months).They identified the LGD by the following:LGD=SPULBt, where SPULBt=1-SMRRt, and SMRRt=where SPULBtis a sample (weighted) percentage of unpaid loan balance at period t, SMRRtis a sample (weighted) marginal recovery rate at time t, i refers to each of the m loan balances outstanding in the sample, and t is the periods after default.They found that the cumulative average recovery is almost completed after 48 months.Moreover, the distribution of cumulative recovery rates is a bi-model distribution.Reference [32] used unsecured consumer loans or credit cards for one UK lender to compare linear regression and survival analysis models to predict LGD.The datasets were collected on 27000 personal loans from 1989-2004.There are two reasons to use survival analysis.Firstly, debts which are still being repaid cannot be included in the standard linear regression approach.Survival analysis models can treat such repayments as censored and easily include them in the model.Secondly, the recovery rate is not normally distributed and therefore modeling it using a linear regression violates the assumptions of linear regression models.The recovery rate is defined as: RR=where RR is a recovery rate and LGD = 1-RR.The study compared linear regression with survival analysis models(proportional hazard models and accelerated failure time models for Weibull, log-logistic, gamma,and Cox model).The linear regression is better than survival models in single distribution models based on higher R-square, higher Spearman rank, and lower MSE.Reference [6] used LGD ratio as shown before to estimate the percentage of LGD after n-months of corporate bond default.However, reference [32] considered equation as shown before to estimate LGD ratio for personal loans.Our study constructs LGD ratio from previous equations to estimate LGD as shown in Eq.(1).

The Basel II risk parameters are PD, LGD and exposure at default (EAD).The rate of expected credit loss (ECL) which is also known as the risk-weighted-asset of credit portfolio can be expressed as the product of PD and LGD.Therefore, LGD is one of the two determining factors of credit losses [33].The ECL of credit portfolio can generally be represented as:

where PDiis the probability of default of theith borrower,i=1,2,...,m,LGDiis the loss given default,EADiis the exposure at default, andmis the total number of borrowers in the portfolio.

3.2 Parametric Distributions

Common parametric distributions for modeling the LGD data are considered.Tab.1 provides the density function and survival function for the distributions considered in this study, which are Beta, Cauchy, Gamma, Gompertz, Logistic, Log-normal, Normal, and Weibull.These distributions are suitable for modeling skewed and heavy-tailed data, which are commonly displayed in LGD data.The empirical pdf for the LGD data in our study can be seen in Fig.1 (pdf).The curve of empirical pdf indicates that the LGD data is skewed and heavy-tailed.

Table 1: Parametric models for LGD

Figure 1: q-q plot, p-p plot, PDF and CDF.Beta distribution, gamma distribution, normal distribution, logistic distribution, Cauchy distribution, exponential distribution

Three types of accuracy criteria are used to choose the best model; Akaike information criterion (AIC), Bayesian information criterion (BIC), and Log-likelihood (LogL).BIC is depend on maximum likelihood estimates of the model parameters [34], which penalizes a sample data with larger size and number of parameters.The formula is defined asBIC=-2ℓ+kp, whereℓrefers to the log likelihood of the estimated model,prefers to the number of parameters, andk=logn.AIC is also depend on maximum likelihood estimates of model parameters [35], but penalizes a sample data with larger size.The formula is defined asAIC=-2ℓ+k*p, whereℓrefers to the log likelihood of the estimated model,prefers to the number of parameters,andk*=2.

3.3 Beta Regression

In our study we use beta regression for fitting the LGD data with covariates.The reason for fitting beta regression is that the distribution is well common to be adequate for modeling quantities bounded in the interval [0,1].Based on the selection of parameters, the probability density function can be unimodal, J-shaped, U-shaped or uniform.It can be shown in the later section that beta distribution is the best model compared to other parametric distributions for fitting the sample data without covariates for our case study.Therefore, we consider several beta regressions, by using different link functions, for fitting the LGD data with covariates.

Let random variable Y follows to Beta distribution, B(α,β), where the parameters are α,β>0.The mean and variance of Y are defined as E(Y)= α/(α + β)and Var(Y)=αβ/((α+β)2(α+β+1))respectively.Reference [36] defined a regression structure of beta distribution as the followings.Let μ=α/(α+β)and ϕ=α+β, so that α=μϕ and β=(1-μ)ϕ.The new parameterizations of Beta regression are E(Y)=μ and Var(Y)=V(μ)/(1+ϕ), where V(μ)=μ(1-μ)for Y ~B(μ,ϕ)with 0<μ<1 and ϕ>0 since α,β>0.Parameter ϕ is known as precision parameter.Since a larger ϕ indicates as smaller variance for a fixed μ, 1/ϕ can also be regarded as the dispersion parameter.

In our study, the precision parameter is modeled in a similar way as the mean parameter.Instead of having a fixed dispersion (fixed variance) we have a varying dispersion (varying variance).Therefore, a varying variance indicate a varying risks and it would be beneficial if significant risk factors (significant covariates) can be determined.It should be noted that variance is commonly used as one of the risk measures in finance area.The risk of loss in finance can be measured using confidence interval, for instance, the 95% confidence interval for loss can be measured as max(μ±2σ), where σ is the standard deviation.

Lety=(y1,...,yn)Tbe a random samplewhereyi~B(μi,ϕi),i=1,...,n.The parameters,μiandϕi, are assumed to satisfy the following functional relations:

whereβ=(β1,...,βk)Tandθ=(θ1,...,θh)Tare defined as vectors of unknown regression parameters that are assumed to be functionally independent,β∈Rkandθ∈Rh,k+h <n,η1iandη2iare predictors, andxi1,...,xiq1,zi1,...,ziq2are observations onq1andq2known covariates which need not be exclusive.

A number of several link functions can be used forg(.), such as logit specification which defined asg(μ)= log(μ/(1 -μ), probit functiong(μ)= Φ-1(μ)where Φ(.)refer to the standard normal distribution function, log functiong(μ)=log(μ), complementary log-log functiong(μ)=-log(-log(μ)), and Cauchy functiong(μ)=tan(π(μ-0.5)).A rich discussion on the link functions can be explained in [37,38], or by referring to Chapter 7 in [39].

The log-likelihood function of Beta regression models defined as:

In this study, we consider Beta regression with different link functions for fitting LGD data with covariates.We use R-squared to select the best regression model.

4 Empirical Results for LGD

A sample data based on the credit portfolio of a banking institution in Jordan is used in our study.The credit bank portfolio was obtained from January 2010-December 2014.The portfolio capacity is 4393, and the overall number of defaults during the 5-year period is 494.The sample size is the same as the number of default, which is 494, and the LGD data lies between interval[0,1].For the case study, a borrower is declared as default if he is unable to pay cash installment in a period of 3 months.

The number of defaultsper annumand the summary statistics for the LGD data are presented in Tab.2.The maximum number of LGD is recorded in years 2010 and 2011.It can be observed that the highest mean of LGD is 97.7% with 0.025 standard deviation in 2010.The high frequency of LGD in 2010 and 2011 in as results of Jordanian economy had late response to financial crisis,which occurred in 2008.The Amman stock exchange (ASE) after 2009 starts decreasing steadily until 2012 [41,42].Indicated that performance of Jordanian Banks sector had negative effect after global financial crisis such as, share prices decreasing and non-performing assets increasing.Therefore, for this reason we see the number of default increased as result of financial crisis that effect of ability of borrowers to pay their borrowing money.However, the lowest mean of LGD is 54.1% with 0.251 standard deviation in year 2014.Furthermore, the minimum LGD is 3.2% in year 2013 and the maximum LGD is 99.6% in year 2010.The R-package is used for fitting the sample data to the parametric models [43].

Tab.3 provides the results of the fitted parametric distributions.Beta distribution is the preferable model because it has the highest LogL and the lowest AIC and BIC.Further comparison can be obtained from the q-q plot, p-p plot, empirical and theoretical PDF, and empirical and theoretical CDF shown in Fig.1.It can be found that beta distribution shows a better fit compared to other parametric distributions.Therefore the mean of Beta distribution can be used as the estimate of LGD for the credit portfolio [44].

The results from Tab.3 show that beta distribution is the best model for fitting the sample data.Therefore, we consider beta regression with different link functions in order to explore the internal and external financial variables which significantly affect the LGD.

The descriptive statistics of the explanatory variables are shown in Tab.4.

Table 3: Parametric models

Tab.5 provides the parameter estimates and standard errors for Beta regressions, which are fitted using different link functions.All regression parameters are significant, at least at 0.10 level,except for beta regression with Cauchy link function where the estimate of LR is insignificant.

The estimates of GDP and Inf are highly significant for the mean(μ).GDP refer to a monetary measure of the market value of all final goods and services produced within the country’s border in a specific period of time (typically 1 year).An increasing GDP means that the economy is expanding, and firms are producing and selling more products or services.When GDP declines, the economy is depicted as being in a downturn.During downturn, fewer goods and services are being sold, business profits turn down, unemployment rises and government tax collections fall.

An increasing inflation rate (Inf) indicates a sustained increment in the prices of goods and services over a specific period of time.Inflation referred to a reduction in the purchasing power per unit of money.In other word, when inflation rate rises, each unit of currency buys fewer goods and services.

Table 4: Summary statistics for explanatory variables

Our results show that a decreasing GDP growth rate resulted in an increasing LGD, while an increasing inflation rate resulted in an increasing LGD.The results are expected in an adverse economic conditions, where GDP diminishes and inflation rate increases as the default frequencies increase and the asset prices decrease [44], and consequently, the recovery rates decrease.When an economic prosperity resumes, the situation reverses, indicating that the capital requirements under these conditions would swing wildly.In general, a lower LGD is favored for calculating the expected loss (EL) of a banking institution.The results show that lower LGD is obtained when the GDP is higher and the inflation rate is lower.

Service pricing policy (SPP) ratio is measured as operating expenses divided by total liability.This ratio measures the funding for operating expenses by the total liability.Higher SPP is resulted when the relative decrease in total liability is more than the decrease in operating expenses.For our case, higher SPP decreases the variance of LGD.It is implied here that larger decreases in total liability resulted in less variations of LGD among the default borrowers.

Operating efficiency ratio (OE) is measured as total operating expenses divided by total operating revenues.The increase in OE is caused by the larger decrease in total operating revenue relative to the decrease in total operating expenses.Our case shows that higher OE decreases the LGD variance.It can be indicated here that larger reduction in total operating revenue leads to less variations of LGD among the default borrowers.

Cash ratio (cash and cash equivalents divided by current liability) is generally a more conservative liquidity ratio measure of a company’s ability to repay its short-term obligations, using only the most liquid of assets, such as cash on hand, cash equivalents (sometimes referred to as marketable securities) and demand deposits.This measure tells creditors the company’s ability to pay all current liabilities immediately without having to sell or liquidate other assets.Higher LR is resulted when current liability has larger decrease than the decrease in cash and cash equivalents.Our results show that higher liquidity ratio (LR) indicates higher variance of LGD.It is implied here that a larger decrease in current liability resulted in more variations of LGD among the default cases.In general, lower variance of LGD is favored in terms of risk measures.The results show that lower variance is obtained when we have higher SPP, higher OE, and lower LR.

Table 5: Beta regression with different link functions

Table 6: Beta regression with R-squared, AIC and BIC

Further results can be seen in Tab.6, where the R-squared, log likelihood, AIC and BIC for each model are provided.Since beta regression with probit link function has the highest R-squared, while having accepted measurements for logL, AIC and BIC, this model is chosen as the best regression model for explaining the relationship between LGD and financial variables.

5 Conclusion

In the context of credit portfolio, LGD is the percentage of exposure that will be lost if a default occurs.Uncertainty with respect to the actual LGD is an important source of risks of credit portfolio, in addition to default risk.In this study, several parametric distributions were used to estimate LGD based on a sample of credit portfolio collected from a bank in Jordan from the period of 2010-2014.The results show that Beta distribution is the best parametric model for estimating LGD based on the following tests; logL, AIC and BIC.Several financial variables were then incorporated to the sample data to find the macro- and micro-economics determinants of LGD.The results show that Beta regression with probit link function has the highest R-squared with accepted measurements for logL, AIC and BIC.The results from beta regression models show that macroeconomic variables (GDP and Inflation rate) are significant for the mean parameter(μ), while microeconomic variables (SPP, OE and LR) are significant for the precision parameter (∅).In particular, a decreasing GDP growth rate resulted in an increasing LGD, while an increasing inflation rate resulted in an increasing LGD.In terms of LGD risks,the variance (risks) of LGD are lower with lower SPP and lower OP, but higher with lower LR.We have proposed successfully the significant microeconomic variables which affect on precision parameter (∅) and compared between six different link functions (Logit, loglog, probit, cloglog,cauchit, and log) by Beta regression.

Funding Statement:This research is supported by the Fundamental Research Grant Scheme/ Ministry of Education Malaysia [Research No.: FRGS/1/2019/STG06/UKM/01/5] and the Research University Grant/Universiti Kebangsaan Malaysia [Research No.: GUP-2019-031].Initials of authors who received the grant: N.Ismail.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.