Featured Product

    Small Samples and the Overuse of Hypothesis Tests

    With powerful computers and statistical packages, modelers can now run an enormous number of tests effortlessly. But should they? This article discusses how bank risk modelers should approach statistical testing when faced with tiny data sets.

    In the stress testing endeavor, most notably in PPNR modeling, bank risk modelers often try to do a lot with a very small quantity of data. It is not uncommon for stress testing teams to forecast portfolio origination volume, for instance, with as few as 40 quarterly observations. Because data resources are so thin, this must have a profound impact on the data modeling approaches.

    The econometrics discipline, whose history extends back only to the 1930s, was developed in concert with embryonic efforts at economic data collection. Protocols for dealing with very small data sets, established by the pioneers of econometrics, can easily be accessed by modern modelers. In the era of big data, in which models using billions of observations are fairly common, one wonders whether some of these econometric founding principles have been forgotten.

    The overuse and misuse of statistical tests

    The issue at hand is the overuse and misuse of statistical tests in constructing stress testing models. While it is tempting to believe that it is always better to run more and more tests, statistical theory and practice consistently warn of the dangers of such an attitude. In general, given a paucity of resources, the key for modelers is to remain “humble” and retain realistic expectations of the number and quality of insights that can be gleaned from the data. This process also involves using strong, sound, and well-thought-out prior expectations, as well as intuition while using the data sparingly and efficiently to help guide the analysis. It also involves taking action behind the scenes to source more data.

    An article by Helen Walker, published in 1940, defines degrees of freedom as “the number of observations minus the number of necessary relations among these observations.” Alternatively, we can say that the concept measures the number of observations minus the number of pieces of information on which our understanding of the data has been conditioned. Estimating a sample standard deviation, for example, will have (n-1) degrees of freedom because the calculation is conditioned on an estimate of the population mean. If the calculation relies on the estimation of k separate entities, I will have (n-k) degrees of freedom available in constructing my model.

    Now suppose that I run a string of 1,000 tests and I am interested in the properties of the 1,001st test. Because, technically, the 1,001st test is conditional on these 1,000 previously implemented tests, I have only (n-1,000) degrees of freedom available for the next test. If, in building my stress test model, n=40, I have a distinct logical problem in implementing the test. Technically, I cannot conduct it.

    Most applied econometricians, however, take a slightly less puritanical view of their craft. It is common for statisticians to run a few key tests without worrying too much about the consequences of constructing a sequence of tests. That said, good econometricians tip their hat to the theory and try to show restraint in conducting an egregious number of tests.

    The power and size of tests is also a critical concern

    When setting out to conduct diagnostic tests,even very well-built statistical tests yield errors. Some of these error rates can usually be well controlled (typically the probability of a false positive result, known as the “size” of the test), so long as the assumptions on which the test is built are maintained. Some error rates (the rate of false negatives) are typically not controlled but depend critically on the amount of data brought to bear on the question at hand. The probability of a correct positive test (one minus the rate of false negatives) is known as the “power” of the test. Statisticians try to control the size while maximizing the power. Power is, unsurprisingly, typically low in very small samples.

    If I choose to run a statistical test, am I required to act on what the test finds? Does this remain true if I know that the test has poor size and power properties?

    Suppose I estimate a model with 40 observations and then run a diagnostic test for, say, normality. The test was developed using asymptotic principles (basically an infinitely large data set) and because I have such a small series, this means that the test’s size is unlikely to be well approximated by its stated nominal significance level (which is usually set to 5%).Suppose the test indicates non-normality. Was this result caused by the size distortion (the probability of erroneously finding non-normality), or does the test truly indicate that the residuals of the model follow some other (unspecified) distribution?

    If I had a large amount of data, I would be able to answer this question accurately and the result of the test would be reliable and useful. With 40 observations, the most prudent response would be to doubt the result of the test, regardless of what it actually indicates.

    Finding non-normality

    Suppose instead that you are confident that the test has sound properties. You have found non-normality: Now what? In modeling literature, there are usually no suggestions about which actions you should take to resolve the situation. Most estimators retain sound asymptotic properties under non-normality. In small samples, a finding of non-normality typically acts only as a beacon – warning estimators to guard against problems in calculating other statistics. Even if the test is sound, it is difficult to ascertain exactly how our research is furthered by knowledge of the result. In this case, given the tiny sample, it is unlikely that the test actually is sound.

    If a diagnostic test has dubious small sample properties, and if the outcome will have no influence over our subsequent decision-making, in our view, the test simply should not be applied. Only construct a test if the result will actually affect the subsequent analysis.

    Dealing with strong prior views

    The next question concerns the use and interpretation of tests when strong prior views exist regarding the likely underlying reality. This type of concept may relate to a particular statistical feature of the data – like issues of stationarity – or to the inclusion of a given set of economic variables in the specification of the regression equation. In these cases, even though we have little data, and even though our tests may have poor size and power properties, we really have no choice but to run some tests in order to convince the model user that our specification is a reasonable one.

    Ideally, the tests performed will merely confirm the veracity of our prior views based on our previously established intuitive understanding of the problem.

    If the result is confounding, however, given that we have only 40 observations, the tests are unlikely to shake our previously stated prior views. If, for example, our behavioral model states that term deposit volume really must be driven by the observed term spread, and if this variable yields a p-value of 9%, should we drop the variable from our regression? The evidence on which this result is based is very weak. In cases where the prior view is well thought out and appropriate, like this example, we would typically not need to shift ground until considerably more confounding evidence were to surface.

    If, instead, the prior suggested a “toss-up” between a range of hypotheses, the test result would be our guiding light. We would not bet the house on the outcome, but the test result would be better than nothing. Toss-ups, however, are very rare in situations where the behavioral model structure has been carefully thought out before any data has been interrogated.

    Running tests with limited data

    With the advent of fast computers and powerful statistical packages, modelers now have the ability to run a huge number of tests effortlessly. Early econometricians, like the aforementioned Ms. Walker, would look on in envy at the ease with which quite elaborate testing schemes can now be performed.

    Just because tests can be implemented does not mean that they necessarily should be. Modern modelers, faced with tiny data sets, should follow the lead of the ancients (many of whom are still alive) and limit themselves to running only a few carefully chosen tests on very deliberately specified models.

    Regulators, likewise, should not expect model development teams to blindly run every diagnostic test that has ever been conceived.

    Featured Experts
    As Published In:
    Related Articles
    Article

    Model Validation Need Not Be a Blood Sport

    The traditional build-and-validate modeling approach is expensive and taxing. A more positive and productive validation experience entails competing models developed by independent teams.

    September 2019 Pdf Dr. Tony Hughes
    Article

    Will CECL Ultimately Be Worth All the Fuss?

    The industry is currently a hive of CECL-related activity. Many banks are busily testing their systems or finalizing their preparations for the go-live date, which is either in January 2020 or somewhat later, depending on the organization. Some are still making plans for implementation, and the rest are worried that they should be.

    August 2019 Pdf Dr. Tony Hughes
    Article

    The Real Value of Stress Testing: Has CCAR Been Validated?

    The theory that banks are now safer because of CCAR, though, has not yet been tested.

    July 2019 Pdf Dr. Tony Hughes
    Article

    CECL, IFRS 9 and the Demand for Forecast Stability

    Loan-loss provisioning models must take a variety of economic and client factors into account, but, with the right approach, banks can develop sensible loss forecasts that are more accurate and less susceptible to volatility.

    June 2019 WebPage Dr. Tony Hughes
    Article

    Climate Change Stress Testing

    As evidence of climate change builds and threats materialize,data will be invaluable in creating a framework for making future credit decisions.

    June 2019 Pdf Dr. Tony Hughes
    Article

    Human Versus Machine: The Pros and Cons of AI in Credit

    In recent years, attention has increasingly turned to the promise of artificial intelligence (AI) to further increase credit availability and to improve the profitability of banks and other lenders. But what is AI?

    May 2019 Pdf Dr. Tony Hughes
    Article

    Finding a CECL Solution for Smaller Banks

    Good-quality CECL projections can be developed using high-quality data that is available free of charge.

    April 2019 Pdf Dr. Tony Hughes
    Article

    Commercial Lending Imbalances and the Looming Recession

    Stress testing, up until now, has basically been a theoretical exercise. Growth has been slow but steady and the imbalances that can trigger recessions have largely been absent. However, with many now calling for a 2020 U.S. recession – and with Brexit looming – we may soon find out whether stress tests actually work when applied in the real world.

    March 2019 Pdf Dr. Tony Hughes
    Article

    Information Overload: The Risk Documentation Challenge

    For banks, one of the more onerous aspects of regulatory supervision is documentation. Regulators demand detailed descriptions of the models being used and banks have responded by the bucket-load.

    February 2019 Pdf Dr. Tony Hughes
    Article

    CECL, IFRS 9 and the Demand for Forecast Stability

    Suppose I have two competing forecasting methods, each designed for CECL or IRFS 9 loss provisioning. Both would pass muster with regulators and auditors. How do I decide which is better?

    January 2019 Pdf Dr. Tony Hughes
    RESULTS 1 - 10 OF 52