General Information & Client Services
  • Americas: +1.212.553.1653
  • Asia: +852.3551.3077
  • China: +86.10.6319.6580
  • EMEA: +44.20.7772.5454
  • Japan: +81.3.5408.4100
Media Relations
  • New York: +1.212.553.0376
  • London: +44.20.7772.5456
  • Hong Kong: +852.3758.1350
  • Tokyo: +813.5408.4110
  • Sydney: +61.2.9270.8141
  • Mexico City: +001.888.779.5833
  • Buenos Aires: +0800.666.3506
  • São Paulo: +0800.891.2518

With powerful computers and statistical packages, modelers can now run an enormous number of tests effortlessly. But should they? This article discusses how bank risk modelers should approach statistical testing when faced with tiny data sets.

In the stress testing endeavor, most notably in PPNR modeling, bank risk modelers often try to do a lot with a very small quantity of data. It is not uncommon for stress testing teams to forecast portfolio origination volume, for instance, with as few as 40 quarterly observations. Because data resources are so thin, this must have a profound impact on the data modeling approaches.

The econometrics discipline, whose history extends back only to the 1930s, was developed in concert with embryonic efforts at economic data collection. Protocols for dealing with very small data sets, established by the pioneers of econometrics, can easily be accessed by modern modelers. In the era of big data, in which models using billions of observations are fairly common, one wonders whether some of these econometric founding principles have been forgotten.

The overuse and misuse of statistical tests

The issue at hand is the overuse and misuse of statistical tests in constructing stress testing models. While it is tempting to believe that it is always better to run more and more tests, statistical theory and practice consistently warn of the dangers of such an attitude. In general, given a paucity of resources, the key for modelers is to remain “humble” and retain realistic expectations of the number and quality of insights that can be gleaned from the data. This process also involves using strong, sound, and well-thought-out prior expectations, as well as intuition while using the data sparingly and efficiently to help guide the analysis. It also involves taking action behind the scenes to source more data.

An article by Helen Walker, published in 1940, defines degrees of freedom as “the number of observations minus the number of necessary relations among these observations.” Alternatively, we can say that the concept measures the number of observations minus the number of pieces of information on which our understanding of the data has been conditioned. Estimating a sample standard deviation, for example, will have (n-1) degrees of freedom because the calculation is conditioned on an estimate of the population mean. If the calculation relies on the estimation of k separate entities, I will have (n-k) degrees of freedom available in constructing my model.

Now suppose that I run a string of 1,000 tests and I am interested in the properties of the 1,001st test. Because, technically, the 1,001st test is conditional on these 1,000 previously implemented tests, I have only (n-1,000) degrees of freedom available for the next test. If, in building my stress test model, n=40, I have a distinct logical problem in implementing the test. Technically, I cannot conduct it.

Most applied econometricians, however, take a slightly less puritanical view of their craft. It is common for statisticians to run a few key tests without worrying too much about the consequences of constructing a sequence of tests. That said, good econometricians tip their hat to the theory and try to show restraint in conducting an egregious number of tests.

The power and size of tests is also a critical concern

When setting out to conduct diagnostic tests,even very well-built statistical tests yield errors. Some of these error rates can usually be well controlled (typically the probability of a false positive result, known as the “size” of the test), so long as the assumptions on which the test is built are maintained. Some error rates (the rate of false negatives) are typically not controlled but depend critically on the amount of data brought to bear on the question at hand. The probability of a correct positive test (one minus the rate of false negatives) is known as the “power” of the test. Statisticians try to control the size while maximizing the power. Power is, unsurprisingly, typically low in very small samples.

If I choose to run a statistical test, am I required to act on what the test finds? Does this remain true if I know that the test has poor size and power properties?

Suppose I estimate a model with 40 observations and then run a diagnostic test for, say, normality. The test was developed using asymptotic principles (basically an infinitely large data set) and because I have such a small series, this means that the test’s size is unlikely to be well approximated by its stated nominal significance level (which is usually set to 5%).Suppose the test indicates non-normality. Was this result caused by the size distortion (the probability of erroneously finding non-normality), or does the test truly indicate that the residuals of the model follow some other (unspecified) distribution?

If I had a large amount of data, I would be able to answer this question accurately and the result of the test would be reliable and useful. With 40 observations, the most prudent response would be to doubt the result of the test, regardless of what it actually indicates.

Finding non-normality

Suppose instead that you are confident that the test has sound properties. You have found non-normality: Now what? In modeling literature, there are usually no suggestions about which actions you should take to resolve the situation. Most estimators retain sound asymptotic properties under non-normality. In small samples, a finding of non-normality typically acts only as a beacon – warning estimators to guard against problems in calculating other statistics. Even if the test is sound, it is difficult to ascertain exactly how our research is furthered by knowledge of the result. In this case, given the tiny sample, it is unlikely that the test actually is sound.

If a diagnostic test has dubious small sample properties, and if the outcome will have no influence over our subsequent decision-making, in our view, the test simply should not be applied. Only construct a test if the result will actually affect the subsequent analysis.

Dealing with strong prior views

The next question concerns the use and interpretation of tests when strong prior views exist regarding the likely underlying reality. This type of concept may relate to a particular statistical feature of the data – like issues of stationarity – or to the inclusion of a given set of economic variables in the specification of the regression equation. In these cases, even though we have little data, and even though our tests may have poor size and power properties, we really have no choice but to run some tests in order to convince the model user that our specification is a reasonable one.

Ideally, the tests performed will merely confirm the veracity of our prior views based on our previously established intuitive understanding of the problem.

If the result is confounding, however, given that we have only 40 observations, the tests are unlikely to shake our previously stated prior views. If, for example, our behavioral model states that term deposit volume really must be driven by the observed term spread, and if this variable yields a p-value of 9%, should we drop the variable from our regression? The evidence on which this result is based is very weak. In cases where the prior view is well thought out and appropriate, like this example, we would typically not need to shift ground until considerably more confounding evidence were to surface.

If, instead, the prior suggested a “toss-up” between a range of hypotheses, the test result would be our guiding light. We would not bet the house on the outcome, but the test result would be better than nothing. Toss-ups, however, are very rare in situations where the behavioral model structure has been carefully thought out before any data has been interrogated.

Running tests with limited data

With the advent of fast computers and powerful statistical packages, modelers now have the ability to run a huge number of tests effortlessly. Early econometricians, like the aforementioned Ms. Walker, would look on in envy at the ease with which quite elaborate testing schemes can now be performed.

Just because tests can be implemented does not mean that they necessarily should be. Modern modelers, faced with tiny data sets, should follow the lead of the ancients (many of whom are still alive) and limit themselves to running only a few carefully chosen tests on very deliberately specified models.

Regulators, likewise, should not expect model development teams to blindly run every diagnostic test that has ever been conceived.

As Published In:
Related Insights

Auto Finance Insights: How Vehicle Information Informs Credit Risk Measures

Tony Hughes and Michael Vogan share valuable insights for managing your auto lending business more effectively.

April 2018 WebPage Dr. Tony HughesMichael Vogan

Market Share-Based Credit Analysis

A counterpoint to traditional credit performance analyses.

March 28, 2018 Pdf Dr. Tony Hughes

The Data Revolution: Gaining Insight from Big and Small Data

In this article, we explore the importance of small data in risk modeling and other applications and explain how the analysis of small data can help make big data analytics more useful.

December 2017 WebPage Dr. Tony Hughes

The Effect of Ride-Sharing on the Auto Industry

Many in the auto industry are concerned about the impact of ride-sharing. In this article analyze the impact of ride-share services like Uber and Lyft on the private transportation market.

July 2017 Pdf Dr. Tony Hughes

The Effect of Ride-Sharing on the Auto Industry

In this article, we consider some possible long-term ramifications of ride-sharing for the broader auto indust

July 2017 WebPage Dr. Tony Hughes

How Will the Increase in Off-Lease Volume Affect Used Car Residuals?

Increases in auto lease volumes are nothing new, yet the industry is rife with fear that used car prices are about to collapse. In this talk, we will explore the dynamics behind the trends and the speculation.

February 2017 WebPage Dr. Tony HughesMichael Vogan

"How Will the Increase in Off-Lease Volume Affect Used Car Residuals?" Presentation Slides

Increases in auto lease volumes are nothing new, yet the industry is rife with fear that used car prices are about to collapse. In this talk, we will explore the dynamics behind the trends and the speculation. The abundance of vehicles in the US that are older than 10 years will soon need to be replaced, and together with continuing demand from ex-lessees, this demand will ensure that prices remain supported under baseline macroeconomic conditions.

February 2017 Pdf Dr. Tony HughesMichael Vogan

Economic Forecasting & Stress Testing Residual Vehicle Values

To effectively manage risk in your auto portfolios, you need to account for future economic conditions. Relying on models that do not fully account for cyclical economic factors and include subjective overlay, may produce inaccurate, inconsistent or biased estimates of residual values.

December 2016 WebPage Dr. Tony Hughes

The Value of Granular Risk Rating Models for CECL

Granular risk rating models allow creditors to understand the credit risk of individual loans in a portfolio, facilitating underwriting and monitoring activities. In this webinar we will outline the value of granular risk rating models for CECL.

November 2016 WebPage Christian HenkelDr. Tony Hughes

Improved Deposit Modeling: Using Moody's Analytics Forecasts of Bank Financial

In this article we demonstrate how to combine our forecasts of bank financial statements with internal data to produce forecasts that better reflect the macroeconomic environment posited under the various Comprehensive Capital Analysis and Review scenarios.

August 2016 Pdf Dr. Tony HughesBrian Poi

Are Deposits Safe Under Negative Interest Rates?

In this article, I take a theoretical look at negative interest rates as a means to stimulate the economy. I identify key factors that may influence the volume of deposits held in the economy. I then empirically describe the unique situation of negative interest rates.

June 2016 WebPage Dr. Tony Hughes

AutoCycle™: Residual Risk Management and Lease Pricing at the VIN Level

We demonstrate the core capabilities of our vehicle residual forecasting model to capture aging and usage effects and illustrate the material implications for car valuation of different macroeconomic scenarios such as recessions and oil price spikes.

May 2016 Pdf Dr. Tony Hughes

Benefits & Applications: AutoCycle - Vehicle Residual Value Forecasting Solution

With auto leasing close to record highs, the need for accurate and transparent used-car price forecasts is paramount. Concerns about the effect of off-lease volume on prices have recently peaked, and those exposed to risks associated with vehicle valuations are seeking new forms of intelligence. With these forces in mind, Moody's Analytics AutoCycle™ has been developed to address these evolving market dynamics.

May 2016 Pdf Dr. Tony HughesDr. Samuel W. MaloneMichael Vogan, Michael Brisson

Alternatives to Long-Term Car Loans?

In this article, our experts focus on two recent developments: how to manage lease-term or model-year concentration risk and how to find affordable finance options for subprime or near-prime sector.

February 2016 Pdf Dr. Tony Hughes

Do Banks Need Third-Party Models?

This article discusses the role of third-party data and analytics in the stress testing process. Beyond the simple argument that more eyes are better, we outline why some stress testing activities should definitely be conducted by third parties.

December 11, 2015 WebPage Dr. Douglas DwyerDr. Tony Hughes

Stress Testing Used-Car Prices

In this presentation we presented a quantitative methodology for incorporating economic factors into car price forecasts.

August 2015 WebPage Dr. Tony HughesMichael Vogan

Systemic Risk Monitor 1.0: A Network Approach

In this article, we introduce a new risk management tool focused on network connectivity between financial institutions.


Forecasts and Stress Scenarios of Used-Car Prices

The market for new cars is growing strongly and lessors need forecasts and associated stress scenarios of future vehicle value to set the initial terms, to monitor the performance of their book and to stress-test cash flows. This presentation offers insight and tools to help lessors in this pursuit.

May 2015 Pdf Dr. Tony Hughes, Zhou Liu, Pedro Castro

Measuring Systemic Risk in the Southeast Asian Financial System

This article looks back at the Asian financial crisis of 1997-1998 and applies new methods of measuring systemic risk and pinpointing weaknesses, which can be used by today’s financial institutions and regulators.


Multicollinearity and Stress Testing

Multicollinearity, the phenomenon in which the regressors of a model are correlated with each other, apparently causes a lot of confusion among practitioners and users of stress testing models. This article seeks to dispel this confusion.

May 2015 WebPage Dr. Tony HughesBrian Poi

What if PPNR Research Proves Fruitless?

This article addresses how banks should look to sources of high-quality, industry-level data to ensure that their PPNR modeling is not only reliable and effective, but also better informs their risk management decisions.

May 2015 WebPage Dr. Tony Hughes

Vehicle Equity and Long-Term Car Loans

In this article, we consider the increasing prevalence of long term loans and use the AutoCycle™ wholesale price forecasts to uncover equity held by the borrower under different economic scenarios.

April 2015 Pdf Dr. Tony Hughes

Modeling the Entire Balance Sheet of a Bank

This article explores the interaction between a bank’s various models and how they may be built into a comprehensive stress testing framework, contributing to the overall performance of a bank.

November 2013 WebPage Dr. Tony Hughes

Is Now the Time for Tough Stress Tests?

The banking industry needs a regulatory framework that is carefully designed to maximize economic outcomes, both in terms of stability and growth, rather than one dictated by past banking sector excesses.

November 2013 WebPage Dr. Tony Hughes

Stressed EDF Credit Measures for Western Europe

In this paper we describe the modeling methodology behind Moody's Analytics Stressed EDF measures for Western Europe. Stressed EDF measures are one-year, default probabilities conditioned on holistic economic scenarios developed in a large-scale,structural macroeconometric model framework.

October 2012 Pdf Danielle Ferry, Dr. Tony Hughes, Min Ding

Stressed EDF™ Credit Measures for North America

In this paper we describe the modeling methodology behind Moody's Analytics Stressed EDF measures. Stressed EDF measures are one-year, default probabilities conditioned on holistic economic scenarios developed in a large-scale, structural macroeconometric model framework. This approach has several advantages over other methods, especially in the context of stress testing. Stress tests or scenario analyses based on macroeconomic drivers lend themselves to highly intuitive interpretation accessible to wide audiences – investors, economists, regulators, the general public, to name a few.

May 2012 Pdf Danielle Ferry, Dr. Tony Hughes, Min Ding

The Moody's CreditCycle Approach to Loan Loss Modeling

This whitepaper goes in-depth into the Moody's CreditCycle approach to loan loss modeling.

Previewing This Year's Stress Tests Using the Bank Call Report Forecasts

To capture a bank’s real capacity to withstand an adverse economic scenario, the best approach is to start with forecasts produced only for accuracy and then apply a conservative overlay as regulation requires. Such forecasts capture mitigating forces such as flight to safety.


Stress Testing and Strategic Planning Using Peer Analysis

Banks face the difficult task of building hundreds of forecasting models that disentangle macroeconomic effects from bank-specific decisions. We propose an approach based on consistently reported industry data that simplifies the modeler’s task and at the same time increases forecast accuracy.

Are $10 Billion-to-$50 Billion Banks Overregulated?

Moody’s Analytics tests whether relaxing capital standards for $10 billion-to-$50 billion banks will boost banking and encourage greater lending.

WebPage Dr. Tony Hughes