Multicollinearity and Stress Testing
Most readers will remember being somewhat perplexed back in their undergraduate days by a topic called multicollinearity. This phenomenon, in which the regressors of a model are correlated with each other, apparently causes a lot of confusion among practitioners and users of stress testing models. This article seeks to dispel this confusion and show how fear of multicollinearity is misplaced and, in some cases, harmful to a model’s accuracy.
Is a fear of multicollinearity justified?
Multicollinearity is common to all nonexperimental statistical disciplines. If we are conducting a fully controlled experiment, we can design our research to ensure the independence of all of the control variables. In bank stress testing, the Fed and the general public, not to mention shareholders, would likely not approve of banks running randomized experiments to discern bank losses under a range of controlled conditions. Instead, banks must do their best to piece together the effects of a range of performance drivers with the limited actual data they have.
Many people take a dim view of multicollinearity, but we don’t belong in this camp. We feel that multicollinearity, rather than being a problem, is actually what keeps risk modelers gainfully employed and enjoying life. Not only would bank stress testing, and life generally, be banal if the phenomenon did not exist, but interrelations between variables would not be possible. Under these circumstances, there would be no need for expert statisticians – even bankers could conduct stress testing! (Personally, we wouldn’t want to live in such a cruel dystopia.)
Multicollinearity makes estimating individual model coefficients imprecise. Say we have two highly correlated regressors. For some purposes it often suffices to include only one in our final model of the dependent variable, even if the unknown “true model” actually contains both. We are seeking to explain variations in the dependent variable using signals gleaned from variations in the independent variables of the regression.
If these signals are, for all intents and purposes, identical, we don’t need both regressors to adequately capture the signal. Including both will lead to a “competition” between the variables, and they will crowd each other out. Though the estimates will be unbiased in the more liberally (and, indeed, correctly) specified model, the individual coefficient estimates will have high standard errors, and thus the probability of obtaining a coefficient that isn’t statistically different from zero or else has the wrong sign would be high. If data are plentiful, on the other hand, we can more easily distinguish the subtle differences between the signals provided by the two variables and include both. Multicollinearity is, always and everywhere, a problem that occurs due to small sample size.
Note that we have talked only of the contributions of individual variables. If the aim of the exercise is forecasting – for which the loss function is specified solely in terms of forecast errors – multicollinearity can be rendered a secondorder problem. If we have two highly correlated variables (say, r = 0.99), and we compare the model estimated using both with a model estimated using just one or the other variable, we will find that baseline projections from the models will usually be very similar. Although the individual contributions are estimated imprecisely, the joint contribution is not. If the sole aim of the model user is forecasting (of which stress testing is a recent but important subdiscipline), the choice between a one and a twovariable model is largely immaterial. Unnecessarily including the second regressor leads to a small efficiency loss (i.e., one degree of freedom), but in the grand scheme of things this is hardly worthy of consideration.
Multicollinearity is more of a problem if the aim of the model is to conduct some form of structural analysis. If we are testing an assertion about the relationship between one of our correlated factors and the dependent variable of interest, too much multicollinearity will tend to drain away the power of the statistical test used for this purpose. Tightly specifying a model and leaving out variables that should be there will typically distort the test’s size. The upside of this tradeoff is that practitioners have more power in conducting their tests.
Rather than considering multicollinearity to be a phenomenon that always increases model risk, validators should instead try to discern the optimal level of multicollinearity in models.
Stress testers may well be interested in conducting this type of structural analysis. For example, a bank may be interested in finding out the main driver of a portfolio’s behavior, unemployment, or household income. This function should, however, be considered separately from the broader problem of projecting future behavior under assumed stress. There are, to our knowledge, no regulatory dictats against stress testers using a “horses for courses” approach to model selection and keeping a stable of models designed for different purposes (so long as these are well documented and well understood).
Validators and examiners should carefully consider the aims of the model when determining whether fear of multicollinearity is justified for model builders.
Model risk and multicollinearity
Now let’s consider cases where worrying about multicollinearity can increase the prevalence of model risk. We use “risk” here in the traditional statistical sense – the expected value of statistical loss across repeated samples. The risk function we use here, assuming squared error loss, is a variation of that discussed in Hughes (2012):
whereare a series of weights that indicate the relative importance of correctly projecting credit losses (or PDs, LGDs, volumes, etc.) in the various Fed’s Comprehensive Capital Analysis and Review (CCAR) scenarios. Expectations are conditional on the relevant Fed scenario actually playing out, and the forecasts (conditional on the relevant scenario) produced are based on the information available at the time.
We view as reasonable the assumption that,though, admittedly, the majority of banks tend to give the adverse scenario less weight than the severely adverse scenario under most circumstances. Fed examiners are well known to also give the baseline scenario considerable weight in their deliberations. (In reality, the risk function must also accommodate idiosyncratic scenarios designed specifically for each bank, but we are leaving that out of our analysis for clarity’s sake.)
To further set the stage, assume that the true data generating process (DGP) is a function only of an unknown subset of the variables published annually by the Fed. In reality, of course, this process is likely to be infinitely more complex than implied by this simple assumption. Suppose, unbeknown to the modeler, that the correct specification includes only the unemployment rate, the rate of GDP growth, and the interest rate on tenyear treasury bills.
The following statements about this situation are all true:
 A model that contains only the three variables in the DGP will minimize overall model risk.
 Any model selection procedure established within this framework will have a nonzero probability of selecting an incorrect model.
 If we select a model that includes not just the three variables, but also additional extraneous variables, our model will still produce unbiased forecasts in all three scenarios, but the forecasts will not be accurate, as discussed above.
 If the selected model excludes one or more of the three variables, projections in all three scenarios will be biased and inconsistent. This situation could yield efficiency gains in parameter estimation, but these are likely to be modest, given that the efficiency of a biased parameter estimate is unlikely to be optimal.
In weighing up the relative costs of the errors made in (3) and (4), the risk of (4) is likely to exceed the risk of (3). From a forecasting perspective, this must also be considered alongside Hughes’ (2012) observation that input forecast errors aren’t possible when computing stress tests that are conditional on a stated macroeconomic scenario. The implication of these observations is that when high levels of multicollinearity are present, the practitioner should still tend to err, at the margin, in favor of the more liberally specified model. We will explore this question, using Monte Carlo simulations, later in this article.
The standard fix for multicollinearity is to drop some of the correlated regressors, but doing so is risky because it increases the probability of making errors like that described in (4). If we estimate a model and find that one variable, intuitively viewed as important, has an estimated coefficient with a pvalue of 0.07, should it necessarily be dropped? In our view, removing the variable is riskier than keeping it. Does the universal application of a 5% significance level really minimize overall model risk when the ultimate goal of the model is to provide stress projections?
Rather than considering multicollinearity to be a phenomenon that always increases model risk, validators should instead try to discern the optimal level of multicollinearity in models. Models that are specified extremely tightly are next to useless when seeking to understand the effects of a range of idiosyncratic stresses on the portfolio. Likewise, models of the “kitchen sink” variety are unlikely to be very useful since many of the drivers will be found to be insignificant. The best model will be a liberally specified one, but where the liberty is not abused.
Shifts in historical correlations
A more pressing issue has to do with scenarios involving shifts in historical correlations between variables. What we mean here are situations in which, for example, two variables have historically been positively correlated but where the Fed, in its infinite wisdom, gives us a scenario in which the two variables move in opposition to each other.
It is crucial that we know how to deal with these situations, as no one knows the nature of the next stress event. Stress test models should be able to cope, at least reasonably well, with unusual happenstances. Models that can only cope with a repeat of the Great Recession and nothing else are next to useless.
We do not need to look far to find a situation in which historical correlations shifted in this way. In recent years, during the 2000s and 2010s, the U.S. Phillips Curve has been modestly negatively sloped. Between January 2000 and November 2014, the correlation coefficient between the unemployment rate and the yearoveryear rate of consumer price inflation has been 0.51. In the Fed’s baseline scenario published in October 2014, the correlation between the two variables is 0.72 across the ninequarter forecast window, and in the severely adverse scenario, the figure is 0.41. In these scenarios, the Fed is saying that Phillips Curve dynamics basically mimic those of recent history. The adverse scenario is completely different; in this case, the correlation is +0.97 across the ninequarter scenario window. To put this into context, during the 1970s – considered the stagflationary nadir by most rightthinking economists – the correlation between the two variables was a mere +0.14.
Now suppose that the true DGP for the probability of default (PD) for a particular portfolio is a function only of inflation and the unemployment rate. We set the parameters of the model to be 2 for inflation and 2 for unemployment, and then simulate data for PD assuming a simple linear functional form and normal errors. Normally, in a model of the default likelihood of fixed repayment loans, we would expect the unemployment rate to be positively signed in our regression and the inflation rate to be negatively signed. Inflation, after all, reduces the burden of nominal principal and interest payments as nominal income rises at a fast clip. Inflation should therefore act to mitigate against the effect of stress, and projected real credit losses should be lower than expected because of increases in the actual unemployment rate.
Such a simple data generating process can throw off unrealistic results – like negative default rates – but we want to keep this exercise as straightforward as possible. We first fit the model containing both variables and exclude any that we find to be insignificant at the 5% level using a standard ttest; we labeled the model selected using this procedure “Chosen.” We then compared the forecasting and stress testing performance of the chosen model with those based on a full model. Table 1 shows results for this simple experiment, assuming 5,000 replications.
Table 1. Forcasting and stress testing performance: comparing the chosen and full models
Source: Moody's Analytics
We found that the correct full model is chosen 59% of the time. Overall, the inflation coefficient is statistically significant in around 67% of cases, whereas the unemployment rate coefficient is significant 91% of the time. As might be expected, always choosing the full model yields forecasts that suffer no appreciable bias in any of the three scenarios. Zero bias here means that the conditional forecasts produced by the model are, on average across the ninequarter forecast window, neither too high nor too low when compared to the expected outcomes of the target variable.
Our small Monte Carlo study has demonstrated in the clearest way possible that extreme forecast bias is most likely when historical relationships shift and key variables are removed from regressions merely because they are insignificant.
The situation changes quite noticeably when we look at the performance of the chosen model. Predictions from this model are too low under baseline conditions and too high in both stressed scenarios. In the severely adverse case, the bias is only slight, but in the adverse case the levels of overprediction are extreme. When we consider root mean squared prediction error (RMSE), whereby the improved efficiency of the smaller models may compensate for the effect of bias, we find that, in all cases, using the full model yields substantially smaller forecast errors than the selected model.
Because the historical correlation between the two variables is preserved in both the baseline and the severely adverse scenarios, we have a pretty good shot at getting decent projections using an incorrectly specified model that excludes one of the variables.
In the adverse scenario, however, the situation changes markedly. In Fed’s adverse scenario, increases in the unemployment rate, which would normally be accompanied by declines in inflation, are now accompanied by rising inflation. Removing the inflation variable from the model means that the historical effects of inflation are conflated with correlated unemployment effects, and the coefficient on the unemployment variable is far higher than it should be as a result. We are powerless to capture the mitigating effect of inflation, and our projections suffer alarmingly as a result.
One could argue that the misspecified model here is more conservative but we think that misses the point. The idea of modeling should be to derive an accurate, unbiased view of reality. Users of models can always apply conservative assumptions to arrive at appropriately austere stress test results.
A fuller exposition of the problem
In the preceding discussion, two features might have immediately jumped out at the reader. The first is that the framework is so simple that it bears no relation to the difficult task of CCARstyle stress testing. The second point is that the experimental setup explicitly favors the larger model, as it is the only correctly specified model in the choice set.
We now address those points by extending our experiment to consider a true DGP that contains three factors and has five potential regressors in our variable selection choice set. The true model, as before, contains unemployment and inflation, to which we add GDP growth with a parameter of 2. The choice set contains these three variables as well as the Baa spread and the tenyear treasury interest rate.
As before, we select a model by excluding any variable that is found to be statistically insignificant at the 5% level and compare this with the strategy whereby the full model (containing all five variables) is used every time. Again, we are interested in the observed bias and RMSE of the calculated projections in the three Fed scenarios. The results are contained in Table 2.
Table 2. Forecasting and stress testing performance: fuller comparison of the chosen and full models
Source: Moody's Analytics
In this case, the full model is potentially at a disadvantage because it always contains two extraneous variables. This has no effect on forecast bias, however, since the estimated model encompasses the true specification. In all scenarios, the full model suffers effectively zero bias.
In this case, our simple model selection procedure yields the correct model (that which contains the three factors) only 15% of the time. More often, one or more of the true factors is missing from the selected model. In 50% of the simulations, one factor is missing; in 29%, two of the important factors are erroneously excluded from the model. This demonstrates a key result of model selection – that, as the choice set expands, the probability of correct selection declines rapidly to zero. In only 3% of the simulations does the model include too many factors. The full model, containing all five variables, is selected by this simple tstatisticbased procedure a mere 0.5% of times.
That the model selection procedure is so easily tricked into excluding important factors is a likely outcome in the presence of multicollinearity.
In this experiment, we find that the selection procedure yields models that produce projections that are consistently too high. Bear in mind that this is a function of our experimental design; we could have just as easily designed an experiment with bias of the opposite sign. Looking at RMSE, we find that the model selected on the basis of ttests yields twice the forecast error of the “full model always” modeling strategy. Improved estimation efficiency does little to mitigate against the proximate threat of omitted variable bias caused by excluding key factors on the basis of an insignificant tstatistic.
Conclusion
In an important sense, the results of this analysis will be unsurprising. That the issue of multicollinearity has little currency when the aim of the modeler is forecasting has been wellknown for many decades. What could be an important issue for structural analysis using regression type models is, at the margin, irrelevant to forecasters.
This is not to say that practitioners should go wild and throw as many drivers into models as they have degrees of freedom available to model them. If our advice is taken to the extreme, efficiency losses will become large enough to outweigh any gain from a reduction in the threat of omitted variable bias. At the margin, however, looking at a tstatistic of 1.7, or even 1.2, should hold few fears for model validators, so long as inclusion of the variable is logical and intuitive.
If our aim was only to conduct baseline forecasting, multicollinearity would be, at best, a secondorder concern. Here, though, we are interested in stress scenarios, in which regulators and senior managers will regularly throw curveballs involving shifts in historical relationships. In this case, a fear of multicollinearity can be positively harmful. Our small Monte Carlo study has demonstrated in the clearest way possible that extreme forecast bias is most likely when historical relationships shift and key variables are removed from regressions merely because they are insignificant. To capture nuanced scenarios like the adverse and severely adverse CCAR events, or bankspecific idiosyncratic happenstances, models need to be specified quite liberally.
Ignoring this advice will not decrease model risk. Rather, it will raise that risk to potentially extreme levels.
SUBJECT MATTER EXPERTS
Dr. Tony Hughes
Managing Director, Economic Research
Tony oversees the Moody’s Analytics credit analysis consulting projects for global lending institutions. An expert applied econometrician, he has helped develop approaches to stress testing and loss forecasting in retail, C&I, and CRE portfolios and recently introduced a methodology for stress testing a bank’s deposit book.
As Published In:
Focuses on helping financial institutions improve their data management practices and capabilities for enhanced risk management, business value, and regulatory compliance.
Next Article
MultiPeriod Stochastic Scenario GenerationRelated Insights
Forecasting Income & Balance Sheet Projections for ComplianceRegulators are placing increased emphasis on the rigor by which banks model their income and balance sheet projections. 
The Effect of RideSharing on the Auto IndustryMany in the auto industry are concerned about the impact of ridesharing. In this article analyze the impact of rideshare services like Uber and Lyft on the private transportation market. 
The Effect of RideSharing on the Auto IndustryIn this article, we consider some possible longterm ramifications of ridesharing for the broader auto indust 
"How Will the Increase in OffLease Volume Affect Used Car Residuals?" Presentation SlidesIncreases in auto lease volumes are nothing new, yet the industry is rife with fear that used car prices are about to collapse. In this talk, we will explore the dynamics behind the trends and the speculation. The abundance of vehicles in the US that are older than 10 years will soon need to be replaced, and together with continuing demand from exlessees, this demand will ensure that prices remain supported under baseline macroeconomic conditions. 
How Will the Increase in OffLease Volume Affect Used Car Residuals?Increases in auto lease volumes are nothing new, yet the industry is rife with fear that used car prices are about to collapse. In this webinar, we explore the dynamics behind the trends and the speculation. The abundance of vehicles in the US that are older than 10 years will soon need to be replaced, and together with continuing demand from exlessees, this demand will ensure that prices remain supported under baseline macroeconomic conditions. 
Economic Forecasting & Stress Testing Residual Vehicle ValuesTo effectively manage risk in your auto portfolios, you need to account for future economic conditions. Relying on models that do not fully account for cyclical economic factors and include subjective overlay, may produce inaccurate, inconsistent or biased estimates of residual values. 
The Value of Granular Risk Rating Models for CECLGranular risk rating models allow creditors to understand the credit risk of individual loans in a portfolio, facilitating underwriting and monitoring activities. In this webinar we will outline the value of granular risk rating models for CECL. 
Improved Deposit Modeling: Using Moody's Analytics Forecasts of Bank Financial Statements to Augment Internal DataIn this article, we demonstrate how to combine our forecasts of bank financial statements with internal data to produce forecasts that better reflect the macroeconomic environment posited under the various Comprehensive Capital Analysis and Review scenarios. 
Improved Deposit Modeling: Using Moody's Analytics Forecasts of Bank Financial Statements to Augment Internal DataWe demonstrate how our service can be used to produce more realistic forecasts of income and balance sheet statements. 
Are Deposits Safe Under Negative Interest Rates?In this article, I take a theoretical look at negative interest rates as a means to stimulate the economy. I identify key factors that may influence the volume of deposits held in the economy. I then empirically describe the unique situation of negative interest rates. 
AutoCycle™: Residual Risk Management and Lease Pricing at the VIN LevelWe demonstrate the core capabilities of our vehicle residual forecasting model to capture aging and usage effects and illustrate the material implications for car valuation of different macroeconomic scenarios such as recessions and oil price spikes. 
Benefits & Applications: AutoCycle  Vehicle Residual Value Forecasting SolutionWith auto leasing close to record highs, the need for accurate and transparent usedcar price forecasts is paramount. Concerns about the effect of offlease volume on prices have recently peaked, and those exposed to risks associated with vehicle valuations are seeking new forms of intelligence. With these forces in mind, Moody's Analytics AutoCycle™ has been developed to address these evolving market dynamics. 
Alternatives to LongTerm Car Loans?In this article, our experts focus on two recent developments: how to manage leaseterm or modelyear concentration risk and how to find affordable finance options for subprime or nearprime sector. 
Small Samples and the Overuse of Hypothesis TestsWith powerful computers and statistical packages, modelers can now run an enormous number of tests effortlessly. But should they? This article discusses how bank risk modelers should approach statistical testing when faced with tiny data sets. 
Do Banks Need ThirdParty Models?This article discusses the role of thirdparty data and analytics in the stress testing process. Beyond the simple argument that more eyes are better, we outline why some stress testing activities should definitely be conducted by third parties. 
Forecasting Income Statements & Balance Sheets Using Industry DataIn this webinar, Dr. Brian Poi, Director, Economic Research, demonstrates how forecasts based on industry data can be used to generate an objective benchmark for internally generated forecasts. 
Improved Deposit Modeling Using Moody's Analytics PreProvision Net Revenue Factors Library to Augment Internal DataIn this article we demonstrate how to combine the PPNR Factors Library with internal data to produce forecasts that better reflect the macroeconomic environment posited under the various U.S. CCAR scenarios. 
Systemic Risk Monitor 1.0: A Network ApproachIn this article, we introduce a new risk management tool focused on network connectivity between financial institutions. 
Residual Car Values Forecasting Using AutoCycle™In this paper we discuss our approach to forecasting residual car values that accounts for cyclical economic factors affecting the automotive industry, under normal and stressed scenarios. 
Forecasts and Stress Scenarios of UsedCar PricesThe market for new cars is growing strongly and lessors need forecasts and associated stress scenarios of future vehicle value to set the initial terms, to monitor the performance of their book and to stresstest cash flows. This presentation offers insight and tools to help lessors in this pursuit. 
Measuring Systemic Risk in the Southeast Asian Financial SystemThis article looks back at the Asian financial crisis of 19971998 and applies new methods of measuring systemic risk and pinpointing weaknesses, which can be used by today’s financial institutions and regulators. 
What if PPNR Research Proves Fruitless?This article addresses how banks should look to sources of highquality, industrylevel data to ensure that their PPNR modeling is not only reliable and effective, but also better informs their risk management decisions. 
Vehicle Equity and LongTerm Car LoansIn this article, we consider the increasing prevalence of long term loans and use the AutoCycle™ wholesale price forecasts to uncover equity held by the borrower under different economic scenarios. 
Putting Systemic Stress into the StressTesting SystemIn this article, banks can significantly improve the effectiveness of their stresstesting exercises by incorporating systemic risk measures. 
Modeling the Entire Balance Sheet of a BankThis article explores the interaction between a bank’s various models and how they may be built into a comprehensive stress testing framework, contributing to the overall performance of a bank. 
Is Now the Time for Tough Stress Tests?The banking industry needs a regulatory framework that is carefully designed to maximize economic outcomes, both in terms of stability and growth, rather than one dictated by past banking sector excesses. 
Stressed EDF Credit Measures for Western EuropeIn this paper we describe the modeling methodology behind Moody's Analytics Stressed EDF measures for Western Europe. Stressed EDF measures are oneyear, default probabilities conditioned on holistic economic scenarios developed in a largescale,structural macroeconometric model framework. 
Stressed EDF™ Credit Measures for North AmericaIn this paper we describe the modeling methodology behind Moody's Analytics Stressed EDF measures. Stressed EDF measures are oneyear, default probabilities conditioned on holistic economic scenarios developed in a largescale, structural macroeconometric model framework. This approach has several advantages over other methods, especially in the context of stress testing. Stress tests or scenario analyses based on macroeconomic drivers lend themselves to highly intuitive interpretation accessible to wide audiences – investors, economists, regulators, the general public, to name a few. 
The Moody's CreditCycle Approach to Loan Loss ModelingThis whitepaper goes indepth into the Moody's CreditCycle approach to loan loss modeling. 
Stress Testing and Strategic Planning Using Peer AnalysisBanks face the difficult task of building hundreds of forecasting models that disentangle macroeconomic effects from bankspecific decisions. We propose an approach based on consistently reported industry data that simplifies the modeler’s task and at the same time increases forecast accuracy.
WebPage
Dr. Tony Hughes, Brian Poi

Previewing This Year's Stress Tests Using the Bank Call Report ForecastsRisk modelers at banks often feel pressure to produce conservative, as opposed to strictly accurate, forecasts of a bank’s resilience in times of stress. Regulators typically frown on capital plans that have even the barest whiff of optimism[1].
WebPage
Dr. Tony Hughes, Brian Poi
