General Information & Client Services
  • Americas: +1.212.553.1653
  • Asia: +852.3551.3077
  • China: +86.10.6319.6580
  • EMEA: +44.20.7772.5454
  • Japan: +81.3.5408.4100
Media Relations
  • New York: +1.212.553.0376
  • London: +44.20.7772.5456
  • Hong Kong: +852.3758.1350
  • Tokyo: +813.5408.4110
  • Sydney: +61.2.9270.8141
  • Mexico City: +001.888.779.5833
  • Buenos Aires: +0800.666.3506
  • São Paulo: +0800.891.2518

Most readers will remember being somewhat perplexed back in their undergraduate days by a topic called multicollinearity. This phenomenon, in which the regressors of a model are correlated with each other, apparently causes a lot of confusion among practitioners and users of stress testing models. This article seeks to dispel this confusion and show how fear of multicollinearity is misplaced and, in some cases, harmful to a model’s accuracy.

Is a fear of multicollinearity justified?

Multicollinearity is common to all non-experimental statistical disciplines. If we are conducting a fully controlled experiment, we can design our research to ensure the independence of all of the control variables. In bank stress testing, the Fed and the general public, not to mention shareholders, would likely not approve of banks running randomized experiments to discern bank losses under a range of controlled conditions. Instead, banks must do their best to piece together the effects of a range of performance drivers with the limited actual data they have.

Many people take a dim view of multicollinearity, but we don’t belong in this camp. We feel that multicollinearity, rather than being a problem, is actually what keeps risk modelers gainfully employed and enjoying life. Not only would bank stress testing, and life generally, be banal if the phenomenon did not exist, but interrelations between variables would not be possible. Under these circumstances, there would be no need for expert statisticians – even bankers could conduct stress testing! (Personally, we wouldn’t want to live in such a cruel dystopia.)

Multicollinearity makes estimating individual model coefficients imprecise. Say we have two highly correlated regressors. For some purposes it often suffices to include only one in our final model of the dependent variable, even if the unknown “true model” actually contains both. We are seeking to explain variations in the dependent variable using signals gleaned from variations in the independent variables of the regression.

If these signals are, for all intents and purposes, identical, we don’t need both regressors to adequately capture the signal. Including both will lead to a “competition” between the variables, and they will crowd each other out. Though the estimates will be unbiased in the more liberally (and, indeed, correctly) specified model, the individual coefficient estimates will have high standard errors, and thus the probability of obtaining a coefficient that isn’t statistically different from zero or else has the wrong sign would be high. If data are plentiful, on the other hand, we can more easily distinguish the subtle differences between the signals provided by the two variables and include both. Multicollinearity is, always and everywhere, a problem that occurs due to small sample size.

Note that we have talked only of the contributions of individual variables. If the aim of the exercise is forecasting – for which the loss function is specified solely in terms of forecast errors – multicollinearity can be rendered a second-order problem. If we have two highly correlated variables (say, r = 0.99), and we compare the model estimated using both with a model estimated using just one or the other variable, we will find that baseline projections from the models will usually be very similar. Although the individual contributions are estimated imprecisely, the joint contribution is not. If the sole aim of the model user is forecasting (of which stress testing is a recent but important sub-discipline), the choice between a one- and a two-variable model is largely immaterial. Unnecessarily including the second regressor leads to a small efficiency loss (i.e., one degree of freedom), but in the grand scheme of things this is hardly worthy of consideration.

Multicollinearity is more of a problem if the aim of the model is to conduct some form of structural analysis. If we are testing an assertion about the relationship between one of our correlated factors and the dependent variable of interest, too much multicollinearity will tend to drain away the power of the statistical test used for this purpose. Tightly specifying a model and leaving out variables that should be there will typically distort the test’s size. The upside of this trade-off is that practitioners have more power in conducting their tests.

Rather than considering multicollinearity to be a phenomenon that always increases model risk, validators should instead try to discern the optimal level of multicollinearity in models.

Stress testers may well be interested in conducting this type of structural analysis. For example, a bank may be interested in finding out the main driver of a portfolio’s behavior, unemployment, or household income. This function should, however, be considered separately from the broader problem of projecting future behavior under assumed stress. There are, to our knowledge, no regulatory dictats against stress testers using a “horses for courses” approach to model selection and keeping a stable of models designed for different purposes (so long as these are well documented and well understood).

Validators and examiners should carefully consider the aims of the model when determining whether fear of multicollinearity is justified for model builders.

Model risk and multicollinearity

Now let’s consider cases where worrying about multicollinearity can increase the prevalence of model risk. We use “risk” here in the traditional statistical sense – the expected value of statistical loss across repeated samples. The risk function we use here, assuming squared error loss, is a variation of that discussed in Hughes (2012):

whereare a series of weights that indicate the relative importance of correctly projecting credit losses (or PDs, LGDs, volumes, etc.) in the various Fed’s Comprehensive Capital Analysis and Review (CCAR) scenarios. Expectations are conditional on the relevant Fed scenario actually playing out, and the forecasts (conditional on the relevant scenario) produced are based on the information available at the time.

We view as reasonable the assumption that,though, admittedly, the majority of banks tend to give the adverse scenario less weight than the severely adverse scenario under most circumstances. Fed examiners are well known to also give the baseline scenario considerable weight in their deliberations. (In reality, the risk function must also accommodate idiosyncratic scenarios designed specifically for each bank, but we are leaving that out of our analysis for clarity’s sake.)

To further set the stage, assume that the true data generating process (DGP) is a function only of an unknown subset of the variables published annually by the Fed. In reality, of course, this process is likely to be infinitely more complex than implied by this simple assumption. Suppose, unbeknown to the modeler, that the correct specification includes only the unemployment rate, the rate of GDP growth, and the interest rate on ten-year treasury bills.

The following statements about this situation are all true:

  1. A model that contains only the three variables in the DGP will minimize overall model risk.
  2. Any model selection procedure established within this framework will have a non-zero probability of selecting an incorrect model.
  3. If we select a model that includes not just the three variables, but also additional extraneous variables, our model will still produce unbiased forecasts in all three scenarios, but the forecasts will not be accurate, as discussed above.
  4. If the selected model excludes one or more of the three variables, projections in all three scenarios will be biased and inconsistent. This situation could yield efficiency gains in parameter estimation, but these are likely to be modest, given that the efficiency of a biased parameter estimate is unlikely to be optimal.

In weighing up the relative costs of the errors made in (3) and (4), the risk of (4) is likely to exceed the risk of (3). From a forecasting perspective, this must also be considered alongside Hughes’ (2012) observation that input forecast errors aren’t possible when computing stress tests that are conditional on a stated macroeconomic scenario. The implication of these observations is that when high levels of multicollinearity are present, the practitioner should still tend to err, at the margin, in favor of the more liberally specified model. We will explore this question, using Monte Carlo simulations, later in this article.

The standard fix for multicollinearity is to drop some of the correlated regressors, but doing so is risky because it increases the probability of making errors like that described in (4). If we estimate a model and find that one variable, intuitively viewed as important, has an estimated coefficient with a p-value of 0.07, should it necessarily be dropped? In our view, removing the variable is riskier than keeping it. Does the universal application of a 5% significance level really minimize overall model risk when the ultimate goal of the model is to provide stress projections?

Rather than considering multicollinearity to be a phenomenon that always increases model risk, validators should instead try to discern the optimal level of multicollinearity in models. Models that are specified extremely tightly are next to useless when seeking to understand the effects of a range of idiosyncratic stresses on the portfolio. Likewise, models of the “kitchen sink” variety are unlikely to be very useful since many of the drivers will be found to be insignificant. The best model will be a liberally specified one, but where the liberty is not abused.

Shifts in historical correlations

A more pressing issue has to do with scenarios involving shifts in historical correlations between variables. What we mean here are situations in which, for example, two variables have historically been positively correlated but where the Fed, in its infinite wisdom, gives us a scenario in which the two variables move in opposition to each other.

It is crucial that we know how to deal with these situations, as no one knows the nature of the next stress event. Stress test models should be able to cope, at least reasonably well, with unusual happenstances. Models that can only cope with a repeat of the Great Recession and nothing else are next to useless.

We do not need to look far to find a situation in which historical correlations shifted in this way. In recent years, during the 2000s and 2010s, the U.S. Phillips Curve has been modestly negatively sloped. Between January 2000 and November 2014, the correlation coefficient between the unemployment rate and the year-over-year rate of consumer price inflation has been -0.51. In the Fed’s baseline scenario published in October 2014, the correlation between the two variables is -0.72 across the nine-quarter forecast window, and in the severely adverse scenario, the figure is -0.41. In these scenarios, the Fed is saying that Phillips Curve dynamics basically mimic those of recent history. The adverse scenario is completely different; in this case, the correlation is +0.97 across the nine-quarter scenario window. To put this into context, during the 1970s – considered the stagflationary nadir by most right-thinking economists – the correlation between the two variables was a mere +0.14.

Now suppose that the true DGP for the probability of default (PD) for a particular portfolio is a function only of inflation and the unemployment rate. We set the parameters of the model to be -2 for inflation and 2 for unemployment, and then simulate data for PD assuming a simple linear functional form and normal errors. Normally, in a model of the default likelihood of fixed repayment loans, we would expect the unemployment rate to be positively signed in our regression and the inflation rate to be negatively signed. Inflation, after all, reduces the burden of nominal principal and interest payments as nominal income rises at a fast clip. Inflation should therefore act to mitigate against the effect of stress, and projected real credit losses should be lower than expected because of increases in the actual unemployment rate.

Such a simple data generating process can throw off unrealistic results – like negative default rates – but we want to keep this exercise as straightforward as possible. We first fit the model containing both variables and exclude any that we find to be insignificant at the 5% level using a standard t-test; we labeled the model selected using this procedure “Chosen.” We then compared the forecasting and stress testing performance of the chosen model with those based on a full model. Table 1 shows results for this simple experiment, assuming 5,000 replications.

Table 1. Forcasting and stress testing performance: comparing the chosen and full models
Source: Moody's Analytics

We found that the correct full model is chosen 59% of the time. Overall, the inflation coefficient is statistically significant in around 67% of cases, whereas the unemployment rate coefficient is significant 91% of the time. As might be expected, always choosing the full model yields forecasts that suffer no appreciable bias in any of the three scenarios. Zero bias here means that the conditional forecasts produced by the model are, on average across the nine-quarter forecast window, neither too high nor too low when compared to the expected outcomes of the target variable.

Our small Monte Carlo study has demonstrated in the clearest way possible that extreme forecast bias is most likely when historical relationships shift and key variables are removed from regressions merely because they are insignificant.

The situation changes quite noticeably when we look at the performance of the chosen model. Predictions from this model are too low under baseline conditions and too high in both stressed scenarios. In the severely adverse case, the bias is only slight, but in the adverse case the levels of overprediction are extreme. When we consider root mean squared prediction error (RMSE), whereby the improved efficiency of the smaller models may compensate for the effect of bias, we find that, in all cases, using the full model yields substantially smaller forecast errors than the selected model.

Because the historical correlation between the two variables is preserved in both the baseline and the severely adverse scenarios, we have a pretty good shot at getting decent projections using an incorrectly specified model that excludes one of the variables.

In the adverse scenario, however, the situation changes markedly. In Fed’s adverse scenario, increases in the unemployment rate, which would normally be accompanied by declines in inflation, are now accompanied by rising inflation. Removing the inflation variable from the model means that the historical effects of inflation are conflated with correlated unemployment effects, and the coefficient on the unemployment variable is far higher than it should be as a result. We are powerless to capture the mitigating effect of inflation, and our projections suffer alarmingly as a result.

One could argue that the misspecified model here is more conservative but we think that misses the point. The idea of modeling should be to derive an accurate, unbiased view of reality. Users of models can always apply conservative assumptions to arrive at appropriately austere stress test results.

A fuller exposition of the problem

In the preceding discussion, two features might have immediately jumped out at the reader. The first is that the framework is so simple that it bears no relation to the difficult task of CCAR-style stress testing. The second point is that the experimental set-up explicitly favors the larger model, as it is the only correctly specified model in the choice set.

We now address those points by extending our experiment to consider a true DGP that contains three factors and has five potential regressors in our variable selection choice set. The true model, as before, contains unemployment and inflation, to which we add GDP growth with a parameter of -2. The choice set contains these three variables as well as the Baa spread and the ten-year treasury interest rate.

As before, we select a model by excluding any variable that is found to be statistically insignificant at the 5% level and compare this with the strategy whereby the full model (containing all five variables) is used every time. Again, we are interested in the observed bias and RMSE of the calculated projections in the three Fed scenarios. The results are contained in Table 2.

Table 2. Forecasting and stress testing performance: fuller comparison of the chosen and full models
Source: Moody's Analytics

In this case, the full model is potentially at a disadvantage because it always contains two extraneous variables. This has no effect on forecast bias, however, since the estimated model encompasses the true specification. In all scenarios, the full model suffers effectively zero bias.

In this case, our simple model selection procedure yields the correct model (that which contains the three factors) only 15% of the time. More often, one or more of the true factors is missing from the selected model. In 50% of the simulations, one factor is missing; in 29%, two of the important factors are erroneously excluded from the model. This demonstrates a key result of model selection – that, as the choice set expands, the probability of correct selection declines rapidly to zero. In only 3% of the simulations does the model include too many factors. The full model, containing all five variables, is selected by this simple t-statistic-based procedure a mere 0.5% of times.

That the model selection procedure is so easily tricked into excluding important factors is a likely outcome in the presence of multicollinearity.

In this experiment, we find that the selection procedure yields models that produce projections that are consistently too high. Bear in mind that this is a function of our experimental design; we could have just as easily designed an experiment with bias of the opposite sign. Looking at RMSE, we find that the model selected on the basis of t-tests yields twice the forecast error of the “full model always” modeling strategy. Improved estimation efficiency does little to mitigate against the proximate threat of omitted variable bias caused by excluding key factors on the basis of an insignificant t-statistic.

Conclusion

In an important sense, the results of this analysis will be unsurprising. That the issue of multicollinearity has little currency when the aim of the modeler is forecasting has been well-known for many decades. What could be an important issue for structural analysis using regression type models is, at the margin, irrelevant to forecasters.

This is not to say that practitioners should go wild and throw as many drivers into models as they have degrees of freedom available to model them. If our advice is taken to the extreme, efficiency losses will become large enough to outweigh any gain from a reduction in the threat of omitted variable bias. At the margin, however, looking at a t-statistic of 1.7, or even 1.2, should hold few fears for model validators, so long as inclusion of the variable is logical and intuitive.

If our aim was only to conduct baseline forecasting, multicollinearity would be, at best, a second-order concern. Here, though, we are interested in stress scenarios, in which regulators and senior managers will regularly throw curveballs involving shifts in historical relationships. In this case, a fear of multicollinearity can be positively harmful. Our small Monte Carlo study has demonstrated in the clearest way possible that extreme forecast bias is most likely when historical relationships shift and key variables are removed from regressions merely because they are insignificant. To capture nuanced scenarios like the adverse and severely adverse CCAR events, or bank-specific idiosyncratic happenstances, models need to be specified quite liberally.

Ignoring this advice will not decrease model risk. Rather, it will raise that risk to potentially extreme levels.

SUBJECT MATTER EXPERTS
As Published In:
Related Insights

The Data Revolution: Gaining Insight from Big and Small Data

In this article, we explore the importance of small data in risk modeling and other applications and explain how the analysis of small data can help make big data analytics more useful.

December 2017 WebPage Dr. Tony Hughes

Producing Objective Income & Balance Sheet Forecasts Presentation Slides

In this presentation, we demonstrate how forecasts based on industry data can be used to generate an objective benchmark of a bank's performance under baseline and stressed scenarios. We demonstrate results though case study of regional banks, peer groups, and larger CCAR-sized institutions.

November 2017 Pdf Brian Poi

Producing Objective Income & Balance Sheet Forecasts

In this webinar, we demonstrate how forecasts based on industry data can be used to generate an objective benchmark of a bank’s performance under baseline and stressed scenarios. We demonstrate results though case study of regional banks, peer groups, and larger CCAR-sized institutions.

November 2017 WebPage Brian Poi

Forecasting Income & Balance Sheet Projections for Compliance

Regulators are placing increased emphasis on the rigor by which banks model their income and balance sheet projections.

July 2017 WebPage Brian Poi

The Effect of Ride-Sharing on the Auto Industry

Many in the auto industry are concerned about the impact of ride-sharing. In this article analyze the impact of ride-share services like Uber and Lyft on the private transportation market.

July 2017 Pdf Dr. Tony Hughes

The Effect of Ride-Sharing on the Auto Industry

In this article, we consider some possible long-term ramifications of ride-sharing for the broader auto indust

July 2017 WebPage Dr. Tony Hughes

How Will the Increase in Off-Lease Volume Affect Used Car Residuals?

Increases in auto lease volumes are nothing new, yet the industry is rife with fear that used car prices are about to collapse. In this webinar, we explore the dynamics behind the trends and the speculation. The abundance of vehicles in the US that are older than 10 years will soon need to be replaced, and together with continuing demand from ex-lessees, this demand will ensure that prices remain supported under baseline macroeconomic conditions.

February 2017 WebPage Dr. Tony HughesMichael Vogan

"How Will the Increase in Off-Lease Volume Affect Used Car Residuals?" Presentation Slides

Increases in auto lease volumes are nothing new, yet the industry is rife with fear that used car prices are about to collapse. In this talk, we will explore the dynamics behind the trends and the speculation. The abundance of vehicles in the US that are older than 10 years will soon need to be replaced, and together with continuing demand from ex-lessees, this demand will ensure that prices remain supported under baseline macroeconomic conditions.

February 2017 Pdf Dr. Tony HughesMichael Vogan

Economic Forecasting & Stress Testing Residual Vehicle Values

To effectively manage risk in your auto portfolios, you need to account for future economic conditions. Relying on models that do not fully account for cyclical economic factors and include subjective overlay, may produce inaccurate, inconsistent or biased estimates of residual values.

December 2016 WebPage Dr. Tony Hughes

The Value of Granular Risk Rating Models for CECL

Granular risk rating models allow creditors to understand the credit risk of individual loans in a portfolio, facilitating underwriting and monitoring activities. In this webinar we will outline the value of granular risk rating models for CECL.

November 2016 WebPage Christian HenkelDr. Tony Hughes

Improved Deposit Modeling: Using Moody's Analytics Forecasts of Bank Financial

In this article we demonstrate how to combine our forecasts of bank financial statements with internal data to produce forecasts that better reflect the macroeconomic environment posited under the various Comprehensive Capital Analysis and Review scenarios.

August 2016 Pdf Dr. Tony HughesBrian Poi

Are Deposits Safe Under Negative Interest Rates?

In this article, I take a theoretical look at negative interest rates as a means to stimulate the economy. I identify key factors that may influence the volume of deposits held in the economy. I then empirically describe the unique situation of negative interest rates.

June 2016 WebPage Dr. Tony Hughes

AutoCycle™: Residual Risk Management and Lease Pricing at the VIN Level

We demonstrate the core capabilities of our vehicle residual forecasting model to capture aging and usage effects and illustrate the material implications for car valuation of different macroeconomic scenarios such as recessions and oil price spikes.

May 2016 Pdf Dr. Tony Hughes

Benefits & Applications: AutoCycle - Vehicle Residual Value Forecasting Solution

With auto leasing close to record highs, the need for accurate and transparent used-car price forecasts is paramount. Concerns about the effect of off-lease volume on prices have recently peaked, and those exposed to risks associated with vehicle valuations are seeking new forms of intelligence. With these forces in mind, Moody's Analytics AutoCycle™ has been developed to address these evolving market dynamics.

May 2016 Pdf Dr. Tony HughesDr. Samuel W. MaloneMichael Vogan, Michael Brisson

Alternatives to Long-Term Car Loans?

In this article, our experts focus on two recent developments: how to manage lease-term or model-year concentration risk and how to find affordable finance options for subprime or near-prime sector.

February 2016 Pdf Dr. Tony Hughes

Small Samples and the Overuse of Hypothesis Tests

With powerful computers and statistical packages, modelers can now run an enormous number of tests effortlessly. But should they? This article discusses how bank risk modelers should approach statistical testing when faced with tiny data sets.

December 2015 WebPage Dr. Tony Hughes

Do Banks Need Third-Party Models?

This article discusses the role of third-party data and analytics in the stress testing process. Beyond the simple argument that more eyes are better, we outline why some stress testing activities should definitely be conducted by third parties.

December 11, 2015 WebPage Dr. Douglas DwyerDr. Tony Hughes

Forecasting Income Statements & Balance Sheets Using Industry Data

In this presentation, Dr. Brian Poi, Director, Economic Research, demonstrates how forecasts based on industry data can be used to generate an objective benchmark for internally generated forecasts.

October 2015 Pdf Brian Poi

Stress Testing Used-Car Prices

In this presentation we presented a quantitative methodology for incorporating economic factors into car price forecasts.

August 2015 WebPage Dr. Tony HughesMichael Vogan

Systemic Risk Monitor 1.0: A Network Approach

In this article, we introduce a new risk management tool focused on network connectivity between financial institutions.

Measuring Systemic Risk in the Southeast Asian Financial System

This article looks back at the Asian financial crisis of 1997-1998 and applies new methods of measuring systemic risk and pinpointing weaknesses, which can be used by today’s financial institutions and regulators.

What if PPNR Research Proves Fruitless?

This article addresses how banks should look to sources of high-quality, industry-level data to ensure that their PPNR modeling is not only reliable and effective, but also better informs their risk management decisions.

May 2015 WebPage Dr. Tony Hughes

Forecasts and Stress Scenarios of Used-Car Prices

The market for new cars is growing strongly and lessors need forecasts and associated stress scenarios of future vehicle value to set the initial terms, to monitor the performance of their book and to stress-test cash flows. This presentation offers insight and tools to help lessors in this pursuit.

May 2015 Pdf Dr. Tony Hughes, Zhou Liu, Pedro Castro

Vehicle Equity and Long-Term Car Loans

In this article, we consider the increasing prevalence of long term loans and use the AutoCycle™ wholesale price forecasts to uncover equity held by the borrower under different economic scenarios.

April 2015 Pdf Dr. Tony Hughes

Modeling the Entire Balance Sheet of a Bank

This article explores the interaction between a bank’s various models and how they may be built into a comprehensive stress testing framework, contributing to the overall performance of a bank.

November 2013 WebPage Dr. Tony Hughes

Is Now the Time for Tough Stress Tests?

The banking industry needs a regulatory framework that is carefully designed to maximize economic outcomes, both in terms of stability and growth, rather than one dictated by past banking sector excesses.

November 2013 WebPage Dr. Tony Hughes

Stressed EDF Credit Measures for Western Europe

In this paper we describe the modeling methodology behind Moody's Analytics Stressed EDF measures for Western Europe. Stressed EDF measures are one-year, default probabilities conditioned on holistic economic scenarios developed in a large-scale,structural macroeconometric model framework.

October 2012 Pdf Danielle Ferry, Dr. Tony Hughes, Min Ding

Stressed EDF™ Credit Measures for North America

In this paper we describe the modeling methodology behind Moody's Analytics Stressed EDF measures. Stressed EDF measures are one-year, default probabilities conditioned on holistic economic scenarios developed in a large-scale, structural macroeconometric model framework. This approach has several advantages over other methods, especially in the context of stress testing. Stress tests or scenario analyses based on macroeconomic drivers lend themselves to highly intuitive interpretation accessible to wide audiences – investors, economists, regulators, the general public, to name a few.

May 2012 Pdf Danielle Ferry, Dr. Tony Hughes, Min Ding

The Moody's CreditCycle Approach to Loan Loss Modeling

This whitepaper goes in-depth into the Moody's CreditCycle approach to loan loss modeling.

Previewing This Year's Stress Tests Using the Bank Call Report Forecasts

Risk modelers at banks often feel pressure to produce conservative, as opposed to strictly accurate, forecasts of a bank’s resilience in times of stress. Regulators typically frown on capital plans that have even the barest whiff of optimism[1].

Stress Testing and Strategic Planning Using Peer Analysis

Banks face the difficult task of building hundreds of forecasting models that disentangle macroeconomic effects from bank-specific decisions. We propose an approach based on consistently reported industry data that simplifies the modeler’s task and at the same time increases forecast accuracy.