Multicollinearity and Stress Testing
Most readers will remember being somewhat perplexed back in their undergraduate days by a topic called multicollinearity. This phenomenon, in which the regressors of a model are correlated with each other, apparently causes a lot of confusion among practitioners and users of stress testing models. This article seeks to dispel this confusion and show how fear of multicollinearity is misplaced and, in some cases, harmful to a model’s accuracy.
Is a fear of multicollinearity justified?
Multicollinearity is common to all nonexperimental statistical disciplines. If we are conducting a fully controlled experiment, we can design our research to ensure the independence of all of the control variables. In bank stress testing, the Fed and the general public, not to mention shareholders, would likely not approve of banks running randomized experiments to discern bank losses under a range of controlled conditions. Instead, banks must do their best to piece together the effects of a range of performance drivers with the limited actual data they have.
Many people take a dim view of multicollinearity, but we don’t belong in this camp. We feel that multicollinearity, rather than being a problem, is actually what keeps risk modelers gainfully employed and enjoying life. Not only would bank stress testing, and life generally, be banal if the phenomenon did not exist, but interrelations between variables would not be possible. Under these circumstances, there would be no need for expert statisticians – even bankers could conduct stress testing! (Personally, we wouldn’t want to live in such a cruel dystopia.)
Multicollinearity makes estimating individual model coefficients imprecise. Say we have two highly correlated regressors. For some purposes it often suffices to include only one in our final model of the dependent variable, even if the unknown “true model” actually contains both. We are seeking to explain variations in the dependent variable using signals gleaned from variations in the independent variables of the regression.
If these signals are, for all intents and purposes, identical, we don’t need both regressors to adequately capture the signal. Including both will lead to a “competition” between the variables, and they will crowd each other out. Though the estimates will be unbiased in the more liberally (and, indeed, correctly) specified model, the individual coefficient estimates will have high standard errors, and thus the probability of obtaining a coefficient that isn’t statistically different from zero or else has the wrong sign would be high. If data are plentiful, on the other hand, we can more easily distinguish the subtle differences between the signals provided by the two variables and include both. Multicollinearity is, always and everywhere, a problem that occurs due to small sample size.
Note that we have talked only of the contributions of individual variables. If the aim of the exercise is forecasting – for which the loss function is specified solely in terms of forecast errors – multicollinearity can be rendered a secondorder problem. If we have two highly correlated variables (say, r = 0.99), and we compare the model estimated using both with a model estimated using just one or the other variable, we will find that baseline projections from the models will usually be very similar. Although the individual contributions are estimated imprecisely, the joint contribution is not. If the sole aim of the model user is forecasting (of which stress testing is a recent but important subdiscipline), the choice between a one and a twovariable model is largely immaterial. Unnecessarily including the second regressor leads to a small efficiency loss (i.e., one degree of freedom), but in the grand scheme of things this is hardly worthy of consideration.
Multicollinearity is more of a problem if the aim of the model is to conduct some form of structural analysis. If we are testing an assertion about the relationship between one of our correlated factors and the dependent variable of interest, too much multicollinearity will tend to drain away the power of the statistical test used for this purpose. Tightly specifying a model and leaving out variables that should be there will typically distort the test’s size. The upside of this tradeoff is that practitioners have more power in conducting their tests.
Rather than considering multicollinearity to be a phenomenon that always increases model risk, validators should instead try to discern the optimal level of multicollinearity in models.
Stress testers may well be interested in conducting this type of structural analysis. For example, a bank may be interested in finding out the main driver of a portfolio’s behavior, unemployment, or household income. This function should, however, be considered separately from the broader problem of projecting future behavior under assumed stress. There are, to our knowledge, no regulatory dictats against stress testers using a “horses for courses” approach to model selection and keeping a stable of models designed for different purposes (so long as these are well documented and well understood).
Validators and examiners should carefully consider the aims of the model when determining whether fear of multicollinearity is justified for model builders.
Model risk and multicollinearity
Now let’s consider cases where worrying about multicollinearity can increase the prevalence of model risk. We use “risk” here in the traditional statistical sense – the expected value of statistical loss across repeated samples. The risk function we use here, assuming squared error loss, is a variation of that discussed in Hughes (2012):
whereare a series of weights that indicate the relative importance of correctly projecting credit losses (or PDs, LGDs, volumes, etc.) in the various Fed’s Comprehensive Capital Analysis and Review (CCAR) scenarios. Expectations are conditional on the relevant Fed scenario actually playing out, and the forecasts (conditional on the relevant scenario) produced are based on the information available at the time.
We view as reasonable the assumption that,though, admittedly, the majority of banks tend to give the adverse scenario less weight than the severely adverse scenario under most circumstances. Fed examiners are well known to also give the baseline scenario considerable weight in their deliberations. (In reality, the risk function must also accommodate idiosyncratic scenarios designed specifically for each bank, but we are leaving that out of our analysis for clarity’s sake.)
To further set the stage, assume that the true data generating process (DGP) is a function only of an unknown subset of the variables published annually by the Fed. In reality, of course, this process is likely to be infinitely more complex than implied by this simple assumption. Suppose, unbeknown to the modeler, that the correct specification includes only the unemployment rate, the rate of GDP growth, and the interest rate on tenyear treasury bills.
The following statements about this situation are all true:
 A model that contains only the three variables in the DGP will minimize overall model risk.
 Any model selection procedure established within this framework will have a nonzero probability of selecting an incorrect model.
 If we select a model that includes not just the three variables, but also additional extraneous variables, our model will still produce unbiased forecasts in all three scenarios, but the forecasts will not be accurate, as discussed above.
 If the selected model excludes one or more of the three variables, projections in all three scenarios will be biased and inconsistent. This situation could yield efficiency gains in parameter estimation, but these are likely to be modest, given that the efficiency of a biased parameter estimate is unlikely to be optimal.
In weighing up the relative costs of the errors made in (3) and (4), the risk of (4) is likely to exceed the risk of (3). From a forecasting perspective, this must also be considered alongside Hughes’ (2012) observation that input forecast errors aren’t possible when computing stress tests that are conditional on a stated macroeconomic scenario. The implication of these observations is that when high levels of multicollinearity are present, the practitioner should still tend to err, at the margin, in favor of the more liberally specified model. We will explore this question, using Monte Carlo simulations, later in this article.
The standard fix for multicollinearity is to drop some of the correlated regressors, but doing so is risky because it increases the probability of making errors like that described in (4). If we estimate a model and find that one variable, intuitively viewed as important, has an estimated coefficient with a pvalue of 0.07, should it necessarily be dropped? In our view, removing the variable is riskier than keeping it. Does the universal application of a 5% significance level really minimize overall model risk when the ultimate goal of the model is to provide stress projections?
Rather than considering multicollinearity to be a phenomenon that always increases model risk, validators should instead try to discern the optimal level of multicollinearity in models. Models that are specified extremely tightly are next to useless when seeking to understand the effects of a range of idiosyncratic stresses on the portfolio. Likewise, models of the “kitchen sink” variety are unlikely to be very useful since many of the drivers will be found to be insignificant. The best model will be a liberally specified one, but where the liberty is not abused.
Shifts in historical correlations
A more pressing issue has to do with scenarios involving shifts in historical correlations between variables. What we mean here are situations in which, for example, two variables have historically been positively correlated but where the Fed, in its infinite wisdom, gives us a scenario in which the two variables move in opposition to each other.
It is crucial that we know how to deal with these situations, as no one knows the nature of the next stress event. Stress test models should be able to cope, at least reasonably well, with unusual happenstances. Models that can only cope with a repeat of the Great Recession and nothing else are next to useless.
We do not need to look far to find a situation in which historical correlations shifted in this way. In recent years, during the 2000s and 2010s, the U.S. Phillips Curve has been modestly negatively sloped. Between January 2000 and November 2014, the correlation coefficient between the unemployment rate and the yearoveryear rate of consumer price inflation has been 0.51. In the Fed’s baseline scenario published in October 2014, the correlation between the two variables is 0.72 across the ninequarter forecast window, and in the severely adverse scenario, the figure is 0.41. In these scenarios, the Fed is saying that Phillips Curve dynamics basically mimic those of recent history. The adverse scenario is completely different; in this case, the correlation is +0.97 across the ninequarter scenario window. To put this into context, during the 1970s – considered the stagflationary nadir by most rightthinking economists – the correlation between the two variables was a mere +0.14.
Now suppose that the true DGP for the probability of default (PD) for a particular portfolio is a function only of inflation and the unemployment rate. We set the parameters of the model to be 2 for inflation and 2 for unemployment, and then simulate data for PD assuming a simple linear functional form and normal errors. Normally, in a model of the default likelihood of fixed repayment loans, we would expect the unemployment rate to be positively signed in our regression and the inflation rate to be negatively signed. Inflation, after all, reduces the burden of nominal principal and interest payments as nominal income rises at a fast clip. Inflation should therefore act to mitigate against the effect of stress, and projected real credit losses should be lower than expected because of increases in the actual unemployment rate.
Such a simple data generating process can throw off unrealistic results – like negative default rates – but we want to keep this exercise as straightforward as possible. We first fit the model containing both variables and exclude any that we find to be insignificant at the 5% level using a standard ttest; we labeled the model selected using this procedure “Chosen.” We then compared the forecasting and stress testing performance of the chosen model with those based on a full model. Table 1 shows results for this simple experiment, assuming 5,000 replications.
Table 1. Forcasting and stress testing performance: comparing the chosen and full models
Source: Moody's Analytics
We found that the correct full model is chosen 59% of the time. Overall, the inflation coefficient is statistically significant in around 67% of cases, whereas the unemployment rate coefficient is significant 91% of the time. As might be expected, always choosing the full model yields forecasts that suffer no appreciable bias in any of the three scenarios. Zero bias here means that the conditional forecasts produced by the model are, on average across the ninequarter forecast window, neither too high nor too low when compared to the expected outcomes of the target variable.
Our small Monte Carlo study has demonstrated in the clearest way possible that extreme forecast bias is most likely when historical relationships shift and key variables are removed from regressions merely because they are insignificant.
The situation changes quite noticeably when we look at the performance of the chosen model. Predictions from this model are too low under baseline conditions and too high in both stressed scenarios. In the severely adverse case, the bias is only slight, but in the adverse case the levels of overprediction are extreme. When we consider root mean squared prediction error (RMSE), whereby the improved efficiency of the smaller models may compensate for the effect of bias, we find that, in all cases, using the full model yields substantially smaller forecast errors than the selected model.
Because the historical correlation between the two variables is preserved in both the baseline and the severely adverse scenarios, we have a pretty good shot at getting decent projections using an incorrectly specified model that excludes one of the variables.
In the adverse scenario, however, the situation changes markedly. In Fed’s adverse scenario, increases in the unemployment rate, which would normally be accompanied by declines in inflation, are now accompanied by rising inflation. Removing the inflation variable from the model means that the historical effects of inflation are conflated with correlated unemployment effects, and the coefficient on the unemployment variable is far higher than it should be as a result. We are powerless to capture the mitigating effect of inflation, and our projections suffer alarmingly as a result.
One could argue that the misspecified model here is more conservative but we think that misses the point. The idea of modeling should be to derive an accurate, unbiased view of reality. Users of models can always apply conservative assumptions to arrive at appropriately austere stress test results.
A fuller exposition of the problem
In the preceding discussion, two features might have immediately jumped out at the reader. The first is that the framework is so simple that it bears no relation to the difficult task of CCARstyle stress testing. The second point is that the experimental setup explicitly favors the larger model, as it is the only correctly specified model in the choice set.
We now address those points by extending our experiment to consider a true DGP that contains three factors and has five potential regressors in our variable selection choice set. The true model, as before, contains unemployment and inflation, to which we add GDP growth with a parameter of 2. The choice set contains these three variables as well as the Baa spread and the tenyear treasury interest rate.
As before, we select a model by excluding any variable that is found to be statistically insignificant at the 5% level and compare this with the strategy whereby the full model (containing all five variables) is used every time. Again, we are interested in the observed bias and RMSE of the calculated projections in the three Fed scenarios. The results are contained in Table 2.
Table 2. Forecasting and stress testing performance: fuller comparison of the chosen and full models
Source: Moody's Analytics
In this case, the full model is potentially at a disadvantage because it always contains two extraneous variables. This has no effect on forecast bias, however, since the estimated model encompasses the true specification. In all scenarios, the full model suffers effectively zero bias.
In this case, our simple model selection procedure yields the correct model (that which contains the three factors) only 15% of the time. More often, one or more of the true factors is missing from the selected model. In 50% of the simulations, one factor is missing; in 29%, two of the important factors are erroneously excluded from the model. This demonstrates a key result of model selection – that, as the choice set expands, the probability of correct selection declines rapidly to zero. In only 3% of the simulations does the model include too many factors. The full model, containing all five variables, is selected by this simple tstatisticbased procedure a mere 0.5% of times.
That the model selection procedure is so easily tricked into excluding important factors is a likely outcome in the presence of multicollinearity.
In this experiment, we find that the selection procedure yields models that produce projections that are consistently too high. Bear in mind that this is a function of our experimental design; we could have just as easily designed an experiment with bias of the opposite sign. Looking at RMSE, we find that the model selected on the basis of ttests yields twice the forecast error of the “full model always” modeling strategy. Improved estimation efficiency does little to mitigate against the proximate threat of omitted variable bias caused by excluding key factors on the basis of an insignificant tstatistic.
Conclusion
In an important sense, the results of this analysis will be unsurprising. That the issue of multicollinearity has little currency when the aim of the modeler is forecasting has been wellknown for many decades. What could be an important issue for structural analysis using regression type models is, at the margin, irrelevant to forecasters.
This is not to say that practitioners should go wild and throw as many drivers into models as they have degrees of freedom available to model them. If our advice is taken to the extreme, efficiency losses will become large enough to outweigh any gain from a reduction in the threat of omitted variable bias. At the margin, however, looking at a tstatistic of 1.7, or even 1.2, should hold few fears for model validators, so long as inclusion of the variable is logical and intuitive.
If our aim was only to conduct baseline forecasting, multicollinearity would be, at best, a secondorder concern. Here, though, we are interested in stress scenarios, in which regulators and senior managers will regularly throw curveballs involving shifts in historical relationships. In this case, a fear of multicollinearity can be positively harmful. Our small Monte Carlo study has demonstrated in the clearest way possible that extreme forecast bias is most likely when historical relationships shift and key variables are removed from regressions merely because they are insignificant. To capture nuanced scenarios like the adverse and severely adverse CCAR events, or bankspecific idiosyncratic happenstances, models need to be specified quite liberally.
Ignoring this advice will not decrease model risk. Rather, it will raise that risk to potentially extreme levels.
Featured Experts
Juan Manuel Licari
Juan M. Licari, PhD, is Chief International Economist with Moody's Analytics. As the Head of Economic and Credit Research in EMEA, APAC and Latin America, Juan and his team specialize in generating alternative macroeconomic forecasts and building econometric tools to model credit risk portfolios.
Jing Zhang
Quantitative researcher; credit risk modeling and analysis expert; indemand industry speaker; published author and CCAR authority
Janet Zhao
Leads RiskCalc™ model development and R&D of new products, actively advises clients on best practices in credit risk management, and provides thought leadership
As Published In:
Focuses on helping financial institutions improve their data management practices and capabilities for enhanced risk management, business value, and regulatory compliance.
Next Article
MultiPeriod Stochastic Scenario GenerationRelated Articles
Article
How Will Climate Change Impact Banks?We look at climate risk and consider how a heating planet might impact a bank's performance
November 2019
Pdf
Dr. Tony Hughes

Presentation
Expanding Roles of Artificial Intelligence and Machine Learning in Lending and Credit Risk Management With everexpanding and improving AI and Machine Learning available, we explore how a lending officer can make good decisions faster and cheaper through AI. Will AI/ML refine existing processes? Or lead to completely new approaches? Or Both? What is the promise? And what is the risk? 
Article
Conservative Banks Do Not Need Conservative ModelsWhen banks manage risk, conservatism is a virtue. We, as citizens, want banks to hold slightly more capital than strictly necessary and to make, at the margin, more provisions for potential loan losses. Moreover, we want them to be generally cautious in their underwriting. But what is the best way to arrive at these conservative calculations?
October 2019
Pdf
Dr. Tony Hughes

Article
Model Validation Need Not Be a Blood SportThe traditional buildandvalidate modeling approach is expensive and taxing. A more positive and productive validation experience entails competing models developed by independent teams.
September 2019
Pdf
Dr. Tony Hughes

Article
Will CECL Ultimately Be Worth All the Fuss?The industry is currently a hive of CECLrelated activity. Many banks are busily testing their systems or finalizing their preparations for the golive date, which is either in January 2020 or somewhat later, depending on the organization. Some are still making plans for implementation, and the rest are worried that they should be.
August 2019
Pdf
Dr. Tony Hughes

Article
The Real Value of Stress Testing: Has CCAR Been Validated?The theory that banks are now safer because of CCAR, though, has not yet been tested.
July 2019
Pdf
Dr. Tony Hughes

Article
CECL, IFRS 9 and the Demand for Forecast StabilityLoanloss provisioning models must take a variety of economic and client factors into account, but, with the right approach, banks can develop sensible loss forecasts that are more accurate and less susceptible to volatility.
June 2019
WebPage
Dr. Tony Hughes

Article
Climate Change Stress TestingAs evidence of climate change builds and threats materialize,data will be invaluable in creating a framework for making future credit decisions.
June 2019
Pdf
Dr. Tony Hughes

Article
Human Versus Machine: The Pros and Cons of AI in CreditIn recent years, attention has increasingly turned to the promise of artificial intelligence (AI) to further increase credit availability and to improve the profitability of banks and other lenders. But what is AI?
May 2019
Pdf
Dr. Tony Hughes

Article
Finding a CECL Solution for Smaller BanksGoodquality CECL projections can be developed using highquality data that is available free of charge.
April 2019
Pdf
Dr. Tony Hughes
