With powerful computers and statistical packages, modelers can now run an enormous number of tests effortlessly. But should they? This article discusses how bank risk modelers should approach statistical testing when faced with tiny data sets.
In the stress testing endeavor, most notably in PPNR modeling, bank risk modelers often try to do a lot with a very small quantity of data. It is not uncommon for stress testing teams to forecast portfolio origination volume, for instance, with as few as 40 quarterly observations. Because data resources are so thin, this must have a profound impact on the data modeling approaches.
The econometrics discipline, whose history extends back only to the 1930s, was developed in concert with embryonic efforts at economic data collection. Protocols for dealing with very small data sets, established by the pioneers of econometrics, can easily be accessed by modern modelers. In the era of big data, in which models using billions of observations are fairly common, one wonders whether some of these econometric founding principles have been forgotten.
The issue at hand is the overuse and misuse of statistical tests in constructing stress testing models. While it is tempting to believe that it is always better to run more and more tests, statistical theory and practice consistently warn of the dangers of such an attitude. In general, given a paucity of resources, the key for modelers is to remain “humble” and retain realistic expectations of the number and quality of insights that can be gleaned from the data. This process also involves using strong, sound, and well-thought-out prior expectations, as well as intuition while using the data sparingly and efficiently to help guide the analysis. It also involves taking action behind the scenes to source more data.
An article by Helen Walker, published in 1940, defines degrees of freedom as “the number of observations minus the number of necessary relations among these observations.” Alternatively, we can say that the concept measures the number of observations minus the number of pieces of information on which our understanding of the data has been conditioned. Estimating a sample standard deviation, for example, will have (n-1) degrees of freedom because the calculation is conditioned on an estimate of the population mean. If the calculation relies on the estimation of k separate entities, I will have (n-k) degrees of freedom available in constructing my model.
Now suppose that I run a string of 1,000 tests and I am interested in the properties of the 1,001st test. Because, technically, the 1,001st test is conditional on these 1,000 previously implemented tests, I have only (n-1,000) degrees of freedom available for the next test. If, in building my stress test model, n=40, I have a distinct logical problem in implementing the test. Technically, I cannot conduct it.
Most applied econometricians, however, take a slightly less puritanical view of their craft. It is common for statisticians to run a few key tests without worrying too much about the consequences of constructing a sequence of tests. That said, good econometricians tip their hat to the theory and try to show restraint in conducting an egregious number of tests.
When setting out to conduct diagnostic tests,even very well-built statistical tests yield errors. Some of these error rates can usually be well controlled (typically the probability of a false positive result, known as the “size” of the test), so long as the assumptions on which the test is built are maintained. Some error rates (the rate of false negatives) are typically not controlled but depend critically on the amount of data brought to bear on the question at hand. The probability of a correct positive test (one minus the rate of false negatives) is known as the “power” of the test. Statisticians try to control the size while maximizing the power. Power is, unsurprisingly, typically low in very small samples.
If I choose to run a statistical test, am I required to act on what the test finds? Does this remain true if I know that the test has poor size and power properties?
Suppose I estimate a model with 40 observations and then run a diagnostic test for, say, normality. The test was developed using asymptotic principles (basically an infinitely large data set) and because I have such a small series, this means that the test’s size is unlikely to be well approximated by its stated nominal significance level (which is usually set to 5%).Suppose the test indicates non-normality. Was this result caused by the size distortion (the probability of erroneously finding non-normality), or does the test truly indicate that the residuals of the model follow some other (unspecified) distribution?
If I had a large amount of data, I would be able to answer this question accurately and the result of the test would be reliable and useful. With 40 observations, the most prudent response would be to doubt the result of the test, regardless of what it actually indicates.
Suppose instead that you are confident that the test has sound properties. You have found non-normality: Now what? In modeling literature, there are usually no suggestions about which actions you should take to resolve the situation. Most estimators retain sound asymptotic properties under non-normality. In small samples, a finding of non-normality typically acts only as a beacon – warning estimators to guard against problems in calculating other statistics. Even if the test is sound, it is difficult to ascertain exactly how our research is furthered by knowledge of the result. In this case, given the tiny sample, it is unlikely that the test actually is sound.
If a diagnostic test has dubious small sample properties, and if the outcome will have no influence over our subsequent decision-making, in our view, the test simply should not be applied. Only construct a test if the result will actually affect the subsequent analysis.
The next question concerns the use and interpretation of tests when strong prior views exist regarding the likely underlying reality. This type of concept may relate to a particular statistical feature of the data – like issues of stationarity – or to the inclusion of a given set of economic variables in the specification of the regression equation. In these cases, even though we have little data, and even though our tests may have poor size and power properties, we really have no choice but to run some tests in order to convince the model user that our specification is a reasonable one.
Ideally, the tests performed will merely confirm the veracity of our prior views based on our previously established intuitive understanding of the problem.
If the result is confounding, however, given that we have only 40 observations, the tests are unlikely to shake our previously stated prior views. If, for example, our behavioral model states that term deposit volume really must be driven by the observed term spread, and if this variable yields a p-value of 9%, should we drop the variable from our regression? The evidence on which this result is based is very weak. In cases where the prior view is well thought out and appropriate, like this example, we would typically not need to shift ground until considerably more confounding evidence were to surface.
If, instead, the prior suggested a “toss-up” between a range of hypotheses, the test result would be our guiding light. We would not bet the house on the outcome, but the test result would be better than nothing. Toss-ups, however, are very rare in situations where the behavioral model structure has been carefully thought out before any data has been interrogated.
With the advent of fast computers and powerful statistical packages, modelers now have the ability to run a huge number of tests effortlessly. Early econometricians, like the aforementioned Ms. Walker, would look on in envy at the ease with which quite elaborate testing schemes can now be performed.
Just because tests can be implemented does not mean that they necessarily should be. Modern modelers, faced with tiny data sets, should follow the lead of the ancients (many of whom are still alive) and limit themselves to running only a few carefully chosen tests on very deliberately specified models.
Regulators, likewise, should not expect model development teams to blindly run every diagnostic test that has ever been conceived.
Managing Director, Economic Research
Tony oversees the Moody’s Analytics credit analysis consulting projects for global lending institutions. An expert applied econometrician, he has helped develop approaches to stress testing and loss forecasting in retail, C&I, and CRE portfolios and recently introduced a methodology for stress testing a bank’s deposit book.
Looks at the best practices of today that will form the successful risk management practices of the future.
Previous ArticleGlobal Banking Regulatory Radar
Tony Hughes and Michael Vogan share valuable insights for managing your auto lending business more effectively.
Call Report Forecast Webinar
In this article, we explore the importance of small data in risk modeling and other applications and explain how the analysis of small data can help make big data analytics more useful.
December 2017 WebPage Dr. Tony Hughes
Increases in auto lease volumes are nothing new, yet the industry is rife with fear that used car prices are about to collapse. In this talk, we will explore the dynamics behind the trends and the speculation. The abundance of vehicles in the US that are older than 10 years will soon need to be replaced, and together with continuing demand from ex-lessees, this demand will ensure that prices remain supported under baseline macroeconomic conditions.
Increases in auto lease volumes are nothing new, yet the industry is rife with fear that used car prices are about to collapse. In this talk, we will explore the dynamics behind the trends and the speculation.
To effectively manage risk in your auto portfolios, you need to account for future economic conditions. Relying on models that do not fully account for cyclical economic factors and include subjective overlay, may produce inaccurate, inconsistent or biased estimates of residual values.
December 2016 WebPage Dr. Tony Hughes