Featured Product

    Data Quality is the Biggest Challenge

    This article looks at the inherent analytical data quality problems in the insurance industry and how to improve them. Insurers have a vast store of both operational and analytical data, but this article focuses primarily on the analytical data required for Solvency II and risk and capital decision-making.

    With Solvency II firmly back on the radar following the recent pronouncement from the European Insurance and Occupational Pensions Authority (EIOPA), insurers are once again reenergizing their Solvency II programs. However, there are clearly many challenges ahead that they must overcome before they can establish a risk-based culture and derive business benefits from a Solvency II program. But perhaps the biggest challenge relates to data, as it is at the core of Solvency II.

    Many insurers are good at calculating numbers, but those numbers are only accurate if the data fed into the actuarial engines is correct. Indeed, EIOPA stipulates that the data used for Solvency II purposes must be accurate, complete, and appropriate. It also mandates a data governance framework. Thus, data is critical for regulatory purposes. Insurers also need good data to support their decision-making processes and meet internal compliance requirements.

    Scope of the problem

    Historically, insurers have suffered from poor data quality, primarily caused by having vast amounts of unstructured data stored in a plethora of systems. Many of these systems are somewhat antiquated (so called legacy systems) and others are desktop based (actuarial software). The problem is compounded by new applications that have been added to these legacy systems, creating multi-layered and potentially redundant IT architectures. Additionally, there is a lack of common data models, data structures, and data definitions.

    Spreadsheets are another problem as they store a significant amount of actuarial and finance data. For example, a large insurer may have hundreds of source systems organized in a number of discrete silos and thousands of spreadsheets from which data is needed. Often the same data is duplicated in different systems but stored in different formats. Standardizing data from external sources such as asset managers and reinsurers is also a challenge.

    Insurers have tried to rationalize legacy systems and impose common data models and structures across these diverse systems. This has generally met with little success. The interaction between these legacy and desktop systems also creates its own problems. The complexity can be overwhelming as shown in Figure 1, which illustrates the overly complex interaction surrounding actuarial modeling.

    Figure 1. Overly complex interaction surrounding actuarial modeling
    Overly complex interaction surrounding actuarial modeling
    Source: Moody's Analytics

    What types of data do insurers need for Solvency II?

    Solvency II basically requires granular actuarial, finance, asset, and risk data, which is categorized as analytical data. Figure 2 illustrates, with some examples, the types of analytical data and from where that data may come. Analytical data primarily comes from systems that, in turn, require data from core administration, claims, CRM systems, etc. Thus, there is a link between analytical and operational data. This sophisticated data chain can be complex, but data quality throughout the chain is essential.

    Perhaps the key issue is that analytical data is different in its structure from the operational data that insurers traditionally store in their warehouses. This is particularly evident in the actuarial arena, which is a new ground for IT. Traditionally, actuarial modeling has been the domain of desktop modeling systems and supplemented heavily with spreadsheets.

    Having homogeneous data is meaningless unless it can be aggregated in a manner that promotes its use. Solvency II (and actuaries) requires aggregated views of multiple sets and the raw data may require sophisticated analytical methods. To ensure that data is properly aggregated, standards must be applied to both the data collection and analysis.

    There are potentially many inputs into an actuarial model – mortality tables, economic scenario generator (ESG) files, assumptions sets, run parameters, policy data (potentially hundreds of inputs). So there is significant value in storing inputs in a repository, especially regarding enterprise access and management and audit controls. There is equal value in storing the output of actuarial models (essentially cash flows) in a sufficiently high level of granularity – something that is typically not covered in modeling technology – particularly for sensitivity analysis and populating the Technical Provision Quantitative Reporting Templates (QRT).

    Regulatory requirements

    Good data governance and practice should already be in place as part of an insurer’s compliance program; however, recent regulation such as Solvency II, International Financial Reporting Standards (IFRS), and Dodd-Frank focuses significantly on data management and quality. Solvency II is an example of this increased focus on data.

    Figure 2. Types of analytical data and related source systems
    Types of analytical data and related source systems
    Source: Moody's Analytics
    Solvency II

    Pillars II and III of Solvency II introduce extensive data management and quality requirements. This not only involves the creation of new data sets and reports, but also data management standards and controls that must be transparent and fully auditable. Indeed, EIOPA requires a Data Quality Management Framework and Policy to be in place as part of the Internal Model Approval Process (IMAP), which is also relevant to the ORSA.

    The purpose of this requirement is to ensure that all data used for Solvency II purposes is accurate, complete, and appropriate. It also establishes standards for data quality. A practical problem is that insurers are not always able to define accurate, complete, or appropriate. The Prudential Regulation Authority (PRA) in the UK noted that this was a particular issue with catastrophe exposure data, where underwriting teams did not always have an adequate understanding of the quality criteria or the point at which a data error could be considered material. Table 1 provides EIOPA’s interpretation.

    EIOPA will finalize the data management requirements in the Omnibus 2 Directive, but insurers can commence their data management projects now as sufficient principles have already been established.

    Table 1. EIOPA data quality requirements
    EIOPA data quality requirements
    Source: Moody's Analytics

    Data quality improvement processes

    The fact that raw analytical data can come from numerous sources (both internal and external) leads to questions regarding its quality, consistency, and reliability – particularly as the volume of data increases.

    An example of this can be found in the multiple policy administration systems maintained by insurers – each of which may store a policyholder’s age and birth date in different formats. Reconciling data is also important. For example, premium data may come from both a policy administration system and the general ledger, but rarely are they the same number.

    Figure 3 illustrates the data chain (or lineage) between source systems and ultimate reporting, and highlights the seven-step data quality process.

    Data quality process

    Improving the quality of data is a multi-faceted process. In essence, it takes raw data and subjects it to a range of tools that use algorithms and business rules, coupled with expert judgment, to analyze, validate, and correct the data as appropriate. Effective data quality tools have in-built data “logic” in terms of patterns, trends, and rules built up over a number of years against which data is tested. Simple errors can thus be automatically corrected. It also raises flags for data that requires expert judgment. The end result may not always produce perfect data (no process can do that), but the data should at least be fit for purpose. Table 2 looks at a typical seven-step process for improving the quality of data.

    Figure 3. Source system and reporting data chain
    Source system and reporting data chain
    Source: Moody's Analytics
    Table 2. Seven-step quality process
    Seven-step quality process
    Source: Moody's Analytics

    Making use of data profiling tools

    Using data profiling, insurers can examine the data available within an existing data repository and assess its quality, consistency, uniqueness, and logic. This is one of the most effective techniques for improving data accuracy in an analytical repository. A number of proprietary data profiling tools are available from vendors in the market.

    Data profiling uses different kinds of descriptive techniques and statistics – such as minimum, maximum, mean, mode, percentile, standard deviation, frequency, and variation, as well as other aggregates, such as count and sum – to analyze data according to known patterns. Using these, an expert can find values that are unexpected and therefore potentially incorrect. Profiling can help insurers identify missing values, which can then be replaced by more logical values generated by data augmentation algorithms.

    Introducing data quality rules

    A key part of the process is data standardization, which fundamentally relates to the execution of a number of data quality rules against the data. Various vendors offer data quality tools that include thousands of rules. These comprise a number of generic rules, together with some specific to the insurance industry. Additionally, such tools also enable insurers to define supplementary rules specific to their own lines of business or function. Table 3 provides examples of the types of data quality rules.

    While it is the role of the IT department to actually use data quality rules, it is up to the practitioners in the business to provide the “logic,” in conjunction with rules that are specific to a particular set of data. When discussing this logic with IT, carefully consider what the ultimate usage of the data is. For example, in terms of policy data, the input required for actuarial modeling is primarily around the type of contract, benefits/coverage, premiums, term, etc. These inputs have to be correct as they impact the accuracy of the cash flows. Other policy-related data, such as postal code, phone number, etc., are not relevant for these purposes and, if incorrect, have no impact on accuracy.

    Table 3. Data quality rules
    Data quality rules
    Source: Moody's Analytics

    The role of spreadsheets: are they a problem?

    No review of analytical data quality would be complete without considering the role of spreadsheets. Spreadsheets are now commonly considered a part of the wider group of technology assets called end user computing (EUC) – that is, any technology asset which may be created, updated, or deleted outside the purview of the IT department or standard software development lifecycle management processes. Other assets in this class include MS Access databases, CSV files, MatLab scripts, etc. However, spreadsheets tend to be the most prolific and problematic of all EUCs because of their flexibility and familiarity to most users – particularly actuaries.

    The ability of a spreadsheet to act as a data source (e.g., data connections/links and expert judgment), a data manipulator (e.g., data joining, restructuring, and translation) and as an application (e.g., formulas and macros) create a variety of data quality issues. When combined with the additional uncertainty of ownership and access rights over specific spreadsheets, it is not surprising that spreadsheet control issues have received specific mention in data thematic reviews conducted by the FSA (now PRA).

    Spreadsheets pervade almost all financial and actuarial processes, but regulatory focus under Solvency II has been drawn particularly to those that hold and manipulate data prior to consumption by the internal model. It is common to find extensive “webs” of thousands of spreadsheets connected by “links” that may have an impact on data quality. In practice, many of these are dormant, but their presence and the possibility of erroneous updates create uncertainty and risk in the modeling process.

    Is there a spreadsheet solution?

    Spreadsheets are not going away – they will remain for the foreseeable future. So one viable solution is to embed spreadsheets into specialist software that effectively provides a control framework. This software operates according to three steps: discovery, triage, and control.

    Discovery is the process by which businesses can evaluate their current dependence on spreadsheets. Such a “health check” will consider the scope and complexity of spreadsheet usage. It can be done manually, but may be accelerated using technology.

    Once the spreadsheet landscape is mapped and analyzed, the future of identified spreadsheets and spreadsheet-supported processes can be triaged for different forms of improvement. This may simply be a matter of training users to further exploit existing solutions, or require the adoption of new software capabilities through integration or third-party vendor solutions.

    It is likely that the spreadsheet triage process will produce a road map for process improvement that will take some years to complete, so that business-critical spreadsheets will continue to exist in the business for a lengthy period. In addition, constant business innovation will undoubtedly continue to create more spreadsheets. Both these factors mean that spreadsheet elimination should be seen as a continuous process, rather than a planned destination. Businesses are therefore increasingly turning to technology to provide on-going spreadsheet control in the form of enterprise spreadsheet management software. This provides the opportunity to detect and report user activity that is outside pre-determined tolerance levels across the key risk areas of data, functionality, and security.

    Ten key thoughts about analytical data quality and governance

    1. Analytical data (actuarial, finance, risk, and asset) is very different in character compared to the transactional data insurers traditionally use and store.
    2. When considering the analytical data, the business needs look to the ultimate usage of the data – usually reports and dashboards – and level of granularity and drill-through required.
    3. IT may utilize Online Analytical Processing (OLAP) techniques to provide sophisticated multi-dimensional views of data; however, a clear definition about the outcome is required.
    4. Not all data may be relevant for a particular purpose; therefore, it is important to express why accuracy is needed. Otherwise, valuable effort can be wasted on improving data quality that has little materiality. Basically, identify the data with the highest impact.
    5. Understand that data lineage needs to be a living process and must be updated as systems and processes change. Equally, the ability to track and audit data lineage should be available on demand and be built into the data quality solution.
    6. Tools can help improve the quality of data, and are a combination of techniques and expert judgment – these tools should be used wherever possible.
    7. Business rules are an important part of the data quality process. While there are many pre-built generic rules, it will be critical to supplement these with user-defined rules that reflect unique business considerations – this requires practitioner input.
    8. Improving data quality is an ongoing process – not just when the data is initially loaded. Data is constantly changing. It should be embedded in the data management and governance framework.
    9. Spreadsheets remain an important element of analytical data and should be carefully managed and controlled.
    10. Most data-related projects do not fail because of the technology – they fail because practitioners cannot precisely define what data they need.
    Featured Experts
    As Published In:
    Related Articles

    The Challenges as Solvency II Reporting Goes Live

    This paper is the first in a series of short whitepapers where Brian Heale examines the major challenges and issues insurers face for report production, data management, and SCR calculation for Solvency II. The series of papers also examines the approaches insurers have taken in their Solvency II projects to date.

    June 2016 Pdf Brian Heale

    A New Advice and Distribution Paradigm in Financial Services

    The way insurance and investment products are distributed and managed in the future will undoubtedly change, but firms can benefit from the new paradigm. This article addresses how financial institutions can remain competitive by delivering intuitive customer journeys at a low cost using the latest technology.

    December 2015 WebPage Philip Allen, Brian Heale

    Latest Developments in the Quantitative Reporting Templates

    In this paper, we look at the latest developments in the Quantitative Reporting Templates. We consider how insurers can address the challenge of maintaining Solvency II reporting systems to keep pace with the changing and emerging regulatory requirements.

    September 2015 Pdf Brian Heale

    Using Analytical Data for Business Decision-Making in Insurance

    This article details the organizational and data challenges that insurers face when harnessing the historical and forward-thinking information needed to create interactive dashboards.

    May 2015 WebPage Brian Heale

    Solvency II and Asset Data

    In this White Paper, we look at the challenges that insurers, fund managers and market data providers face in providing and aggregating the asset data required for the completion of the QRT templates and the SCR calculation.

    December 2014 Pdf Brian Heale

    Challenges Impacting the Global Insurance Industry in 2015 and Beyond

    In this interview, Moody's Analytics Senior Director Brian Heale shares his unique expertise on insurance and Solvency II. Learn how global regulations, demographic trends, and technology will impact insurers over the next few years and how they can best prepare for the changes.

    November 2014 WebPage Brian Heale

    Data: The Foundation of Risk Management

    This article focuses on developing an effective data management framework for the analytical data used for regulatory and business reporting.

    November 2014 WebPage Brian Heale

    Automating the Solvency Capital Requirement Calculation Process

    This Whitepaper explores how the Solvency II Solvency Capital Requirement (SCR) calculation process can be automated to facilitate efficient and timely regulatory reporting. The SCR calculation process is complex, requiring significant data consolidation, cleansing and transformation to produce accurate and consistent results.

    July 2014 Pdf Brian Heale

    A Holistic Approach to Counterparty Credit Risk Management

    As the deadline for Solvency II approaches, many insurers are assessing the best approach to delivering the Pillar III reports required by EIOPA. Watch the Moody's Analytics Pillar III Reporting Webinar to learn the common implementation challenges of Pillar III reporting.

    June 16, 2014 WebPage Brian Heale

    Learn The Three Steps to Solvency II Pillar III Reporting

    As the deadline for Solvency II approaches, many insurers are assessing the best approach to delivering the Pillar III reports required by EIOPA. Many recognize the challenges of data consolidation, data cleansing, calculating accurate results and formatting reports to submit to the regulators.

    June 2014 WebPage Brian Heale
    RESULTS 1 - 10 OF 20