Big data isn’t just for Silicon Valley. This article discusses the trend of large data set capture and analysis by regulators, referred to here as “regulatory big data,” by detailing the motivations and goals of regulators and examining three significant regulatory big data initiatives: AnaCredit in the European Union (EU), FDSF in the UK, and FR Y-14M in the United States. It then analyzes how these efforts complement other significant, and concurrent, regulatory reporting and IT efforts.
Financial institutions worldwide are facing an increased level of regulatory scrutiny in the aftermath of the Great Recession. Regulatory efforts entail new and expanded reports, templates, and calculations from financial institutions that are often little more than detailed presentations of information summarized in existing regulatory disclosures such as call reports and loan loss summaries. Many institutions view them as just another mile on the seemingly endless “produce another report” treadmill – one that significantly increases their compliance costs.
However, a new paradigm – “regulatory big data” – is emerging worldwide in which an institution provides its bulk underlying granular data to regulators for their regulatory, risk assessment, and stress testing efforts.
In the technology world, big data generally refers to an amount of data (both structured and unstructured) so vast that analysis of it requires new processing techniques. Examples of big data include a major search engine’s index of all the web pages it has crawled, the monthly cash register transaction receipts at a national supermarket chain, and a hospital’s analysis of patient treatment plans to determine ways to lower readmission rates.
This data generally lacks the well-defined structure that would allow it to fit neatly into standard relationship database tables. Additionally, the often multi-terabyte to petabyte size of this data is quite challenging for a typical relational database management system. As a result, technology companies have developed a wide variety of new software frameworks – MapReduce, Apache Hadoop, and NoSQL, to name a few – to analyze and process these big data repositories, often in parallel with traditional database tools.
Regulatory big data is a catch-all term (like big data) to describe the capture and processing of larger sets of regulatory data via the use of both traditional and newly developed data processing tools. Although this data isn’t of the same magnitude as, for example, a record of clicks on a popular website, its volume is much greater than the summary regulatory reports submitted today. Big data tools and techniques can help regulators process the large amounts of data produced by financial institutions for oversight and compliance purposes.
Regulators aim to use big data sets to complement traditional banking supervision, help maintain financial stability, and establish monetary policy regulation procedures. Proposals and rules from regulatory bodies such as the Basel Committee on Banking Supervision (BCBS), the Bank of England, and the Federal Reserve (Fed), as well as speeches and articles on the topic by regulators, have elucidated a variety of goals for regulatory big data, among them:1
- Producing a rich regulatory dataset with finer granularity than current reports provide
- Eliminating real or perceived data gaps in existing reporting by requiring more frequent and less aggregated data submissions
- Reaping the benefits of advanced big data technology without the need to use proprietary technology from big data firms
- Aggregating an obligor’s total indebtedness across multiple institutions
- Establishing a tools and data submissions framework to provide a comprehensive overview of banks and their obligors at the push of a button
To start the process of working toward these goals, various regulatory big data initiatives have been advocating, if not mandating, that firms provide a much finer level of data granularity than ever before, oftentimes down to individual loans. They are also pushing for consistent identification and organizing of counterparties through initiatives like the Legal Entity Identification (LEI) effort and more frequent (e.g., monthly instead of quarterly or annual) data submissions.
This new data would provide regulators with the ability to:
- Build a foundation for more comprehensive and frequent institutional stress testing
- Measure institutional and obligor interconnectedness and contagion risk, by looking at obligors common to multiple institutions
- Conduct micro- and macro-prudential risk analysis, enhanced by the ability to assess the risk of obligors across firms and the movement of risk factors over time
In summary, the increased quantity, and the likely more frequent submission, of data for regulatory big data initiatives presents an opportunity for regulators to enhance their understanding of both the institutions they regulate and the credit exposures of individual obligors across institutions.
In the following sections, we provide information on three current regulatory big data initiatives: Analytic Credit Dataset (AnaCredit) in the European Union (EU); Firm Data Submission Framework (FDSF) in the UK; and FR Y-14M Capital Assessments and Stress Testing (FR Y-14M) in the US. We discuss the features of these initiatives, as well as the regulatory motivation and, critically, the effect on firms subject to these rules.
AnaCredit is an EU-proposed Central Credit Register (CCR) of, initially, loans to non-financial corporations. AnaCredit’s aim is to build up and link existing national CCRs to a Europe-wide credit data repository accessible by the European Central Bank (ECB) and other European nations’ central banks. Essentially, AnaCredit mandates the collection of granular data from financial institutions.
Only countries subject to the ECB (i.e., euro-denominated countries) are considering this initiative, so non-euro zone countries such as the UK are not taking part. AnaCredit’s rollout will initially apply to an estimated 3,500 banks in the EU. Other lenders, including non-EU institutions operating in the EU, do not fall under the initial scope.
Implementation timelines have shifted since AnaCredit was formally introduced through the ECB decision (ECB/2014/6) on February 24, 2014;2 the latest estimate is for a phased implementation starting in January 2018.
The objectives of ECB/2014/6 are to define:
…preparatory measures necessary to establish in a stepwise manner a long-term framework for the collection of granular credit data based on harmonized ECB statistical reporting requirements. This long- term framework shall include by the end of 2016: (a) national granular credit databases operated by all Eurosystem NCBs, and (b) a common granular credit database shared between the Eurosystem members and comprising granular credit data for all Member States whose currency is the euro.
The ECB intends to leverage existing national CCRs as a foundation for AnaCredit, but a large amount of work still needs to be done. For example, the ECB must roll out specific AnaCredit regulations, which are not expected until the second quarter of 2015 at the earliest.
Though they are not the final rules for AnaCredit, the ECB/2014/6’s “prepare for AnaCredit” regulations shed some light on both the work needed and the likely capabilities of this system. The preparatory work mandated by ECB/2014/6 includes:
- Identifying relevant end-user needs
- Defining the granular credit data sets that will be collected and linked
- Developing a way to transmit granular credit data securely
- Developing detailed operational arrangements, given the sensitivity of the data
- Establishing a timetable for specific steps and deliverables and for monitoring progress
- Addressing confidentiality, use of data, and governance
Looking at existing European credit registers and the overall goals of these regulatory big data initiatives, we think it likely that, ultimately, the AnaCredit system will include:
- Data on obligors, including unique identifiers – in particular, the identifiers being developed as part of LEI – enabling the linking of obligors across institutions
- Amount of assets, financial derivatives, and certain off-balance sheet items
- Loan IDs, inception and maturity dates, interest rates, and any financial guarantees tied to loans
- Analytic measurements such as loan performance data, borrower probability of default (PD), and exposure Loss Given Default (LGD) estimates
From our analysis of similar initiatives and the preparatory work involved, we expect the ECB will consider between 24 and 150 – or more – attributes per loan for inclusion in this initiative, although the exact number and composition have not yet been determined. There are several other unknowns, including:
- The lower reporting threshold – specifically, the euro level of individual loans that do not need to be reported
- The schedule of assets that need to be reported
- The reporting schedule for institutions, particularly foreign institutions operating in the EU and non-bank financial companies
The resulting analytic dataset – although the exact composition, rollout schedule, and asset mix are still undecided – will provide the ECB and national central banks with a comprehensive view of loan exposures in an institution and of obligors across institutions. Once assembled, AnaCredit’s large granular dataset will allow regulators a view into the institutions they regulate that is not currently available with summary-level reports.
The Firm Data Submission Framework (FDSF), another example of a regulatory big data project, is a quarterly granular reporting requirement from the Bank of England’s Prudential Regulation Authority (PRA). FDSF applies to the UK’s eight Systematically Important Financial Institutions (SIFIs) and was developed to provide quantitative, forward-looking assessments of the capital adequacy of the UK banking system and the individual institutions in it.
Like the Fed’s Comprehensive Capital Analysis and Review (CCAR), on which it is loosely based, FDSF requires that institutions collect data from all of their significant operating units and use this data, based on PRA guidance and stress scenarios, in a wide variety of stress calculations.
The FDSF requires a level of granular data and analysis far in excess of the typical reports submitted to UK regulators. The intense scrutiny and significance of this exercise (which, in the event of unsatisfactory results, could prompt the regulator to prohibit capital payouts) means that each institution requires extensive audit trails, detailed documentation of assumptions, full traceability of calculations, and the analysis of many scenarios.
As with other regulatory big data initiatives, the data required for FDSF would be sourced from a variety of internal areas (e.g., individual business units, each likely having multiple products and associated accounting systems and assumptions) in myriad formats. Assembling this data is not as simple as appending rows to an existing table, however; reporting date time gaps and missing data are likely, as is the need to try out various assumptions on key parameters such as expected losses and probabilities of default. Moreover, the calculated stress results are also likely to vary significantly by product line and geography – a downturn in UK property prices, for example, will likely have a significantly different impact on a Scottish residential mortgage portfolio than on a Greek shipping loan book.
An additional layer of complexity arises from the potential need to map internal data structures to those defined by the PRA. Although a bank’s systems may have just a few occurrences of key attributes as “exposure,” an institution has to reconcile all of them with the PRA’s precise definition. An extensive audit trail is also necessary so an institution can drill down and aggregate the data as needed or requested by the PRA.
As with AnaCredit, the FDSF rules define regulatory big data according to the following parameters:
- An extensive amount of data
- Data that typically needs extensive mapping, aggregating, and cleaning prior to submission for use in stress testing calculations
The use of the data is quite complicated, as multiple, iterative stress tests are typically required of the banks subject to this regulation. In practice, however, the ability of big data tools to handle large datasets of partially structured data (i.e., not fully contained in clean relational database tables) facilitates the FDSF initiatives. Banks can use flat files of raw data from differing systems, scripting languages, and statistical packages to manage this large load of data and complicated analytic requirements, while maintaining a strict data quality regime.
The FR Y-14M reporting program is a regulatory big data initiative by the Fed in the United States. Under FR Y-14M, bank holding companies with consolidated assets of $50 billion or more must submit detailed home equity and credit card loan data, along with portfolio and address data, to the Fed on the last business day of each month.
In contrast to the existing – and predominantly – summary and aggregated reporting required of banks and bank holding companies, FR Y-14M is a true regulatory big data program, as it requires reporting of all portfolio and service loans and lines in several broad portfolio categories every month. The FR Y-14M initiative, like the AnaCredit and FDSF regulations, aims to furnish regulators with the tools and data necessary to monitor very granular risk in a timely, near-continuous fashion.
Institutions subject to these requirements must contend with several complex technical challenges, given the significant amounts of sensitive data they have to prepare and submit every month. The Fed requires that bank holding companies subject to these rules report all of the following active and serviced lines and loans in its portfolio:
- Revolving, open-end loans secured by one to four family residential properties and extended lines of credit
- Junior-lien closed-end loans secured by one to four family residential properties
- Domestic credit cards
Additionally, banks have to report detailed information on previously reported loans that migrate out of the portfolio (e.g., are paid off or have defaulted), as well as information to facilitate address-matching across loans and portfolio-level information.
An extensive amount of information is required for each loan. For example, domestic first liens require 137 lines of data per loan that includes a wide range of data elements, such as origination credit score, current credit score, probability of default, mortgage insurer, valuation at origination and at current time, foreclosure status, and both actual (if any) loss given default and expected loss given default.
The Fed uses the data collected under FR Y-14M for a wide range of regulatory purposes, including assessing a bank’s capital adequacy, supporting periodic supervisory stress tests, and even enforcing Dodd-Frank consumer production measures.
The sheer volume of information banks must provide every month, the detailed and sensitive nature of data collected, and the disparate uses of this data – everything from consumer protection to stress testing – all present new technical challenges for banks and regulators. Because the Fed’s analysis of a bank’s FR Y-14M data could result in regulatory actions with a material impact (e.g., restricting dividend payouts), banks must take extreme care to ensure that this data is accurate, timely, and auditable.
Moody’s Analytics research reveals that to meet these challenges, institutions are building complementary processes outside their traditional credit-relational database management systems and using new big data analysis and formatting tools. They seek to separate their large-scale data processing efforts from other reporting and analysis of the database, all while maintaining a clear and auditable record of data submissions.
As an illustration of the tools and technology available today, one institution implemented a single, large monthly data extraction script to move masses of raw data (i.e., in a non-submission format and missing several calculations) from its database to a large set of flat files, then relied on an open source statistical package to clean up and append the data to a large statistical data file. The institution then used another set of procedures in the statistical program to extract and format the data for its monthly submissions and to build a detailed log of the preparation of the data submitted. In addition, the regulators themselves are also launching several complementary efforts that are transforming banks’ data handling and reporting techniques.
- The Global Financial Markets Association (GFMA), among others, is coordinating the LEI initiative, wherein each single legal entity is assigned a unique ID. LEIs will facilitate aggregation of an obligor’s exposures across institutions and easy analysis of an entity’s exposures within an organization, as well as the use of external data on an obligor to supplement the data an institution might have.
- The Bank for International Settlements (BIS) has produced “Principles for effective risk data aggregation and risk reporting” (BCBS 239 publication) that comments on, among other areas, the risk infrastructure and risk aggregation methods of larger banks. As the publication shows, regulators recognize how critical risk infrastructure and technology are at large financial institutions.
- The G20’s Data Gaps Initiative (DGI) is a set of 20 recommendations for enhancing economic and financial statistics, covering broad topic areas such as “Monitoring Risk in the Financial Sector” and “Financial Datasets.” Although aimed primarily at regulatory bodies, the DGI will have a profound effect on individual firms, given that it calls for standardized reporting templates for large international exposures and an overall higher level of reporting data granularity from most financial firms.
The common theme of these initiatives is that financial institutions have to produce much larger volumes of data in a more consistent and controlled way. Organizations must have the infrastructure and the skills in place to consistently produce and submit large data sets of critical information to their regulators.
AnaCredit, FDSF, and FR Y-14M are the first of what look to be numerous efforts by regulators to capture more granular data much more frequently, from the firms they regulate.
This emphasis on raw data places considerable technical and operational burdens on institutions. Regulatory reporting will no longer comprise an Excel file or two emailed every quarter but, rather, an extensive process of assembling highly sensitive data, which regulators can then use for critical tasks like approving a bank’s dividend policy.
Institutions need to take a fresh look at their data handling, reporting, technology, and security architecture to ensure that they meet these significant new challenges. The emergence of big data tools and technologies, many of which are open source, can help institutions achieve compliance.
1 Michael Ritter, Deutsche Bundesbank, Chair of the ESCB Working Group on Credit Registers, Central Credit Registers (CCRs) as a Multi Purpose Tool to close Data Gaps, May 2014. Anne Le Lorier, Deputy Governor – Banque de France, Seventh ECB Statistics Conference, Towards the banking Union: Opportunities and challenges for statistics, October 2014
2 “Decision of the European Central Bank of 24 February 2014 on the organisation of preparatory measures for the collection of granular credit data by the European System of Central Banks (ECB/2014/6)”
Focuses on helping financial institutions improve their data management practices and capabilities for enhanced risk management, business value, and regulatory compliance.
Coverage this month includes , the Financial Stability Board (FSB) agreed its 2017 work plan. The European Banking Authority (EBA) report with qualitative and quantitative observations of its first impact assessment of the International Financial Reporting Standard (IFRS) 9, accounting for financial instruments, standard. The European Commission (EC) presented a comprehensive package of reforms aimed at further strengthening the resilience of European Union (EU) banks. The United States (US) Government Accounting Office (GAO) issued a report detailing additional actions which could help the Federal Reserve achieve its stress testing goals. The Hong Kong Monetary Authority (HKMA) issued a consultation on the local implementation of the Net Stable Funding Ratio (NSFR).
November 2016 Pdf Michael van Steen
This newsletter provides information about key developments in Banking regulations worldwide. New articles are sorted by country, and are associated with keywords.
September 2016 Pdf Michael van Steen
To help better understand this specific effort and its larger consequences, this article summarizes AnaCredit's rationale, presents its historic and future timelines, and highlights its features and challenges. The article concludes by offering some guidance on how institutions can best meet the challenges of and benefit...
June 2016 WebPage Michael van Steen
In the present economic landscape, risk management at the point of credit origination is not just tied to the spreading of risk data or selecting the correct product proposal and pricing. Effective and comprehensive risk management is a series of steps taken in a continuum that integrates covenant tracking, counterparty management, financial spreading, probability of default, loss given default (LGD), limits checking, product proposals and pricing and back-office activities.
April 2014 Pdf Michael van Steen
This article presents the various components of the model risk management framework institutions employ to meet their need to build, manage, and benefit from the models they use.
November 2013 WebPage Michael van Steen