Skip to main content

A lakehouse for financial services, featuring a modern architecture and a lake in the background.


It's safe to say that life changed for everyone in the first quarter of 2020. For many of us, the Covid-19 pandemic meant a significant change in lifestyle. In addition to that, it also had a significant impact on major industries such as technology, healthcare, travel and hospitality, and financial services. For the insurance industry, the pandemic accelerated some existing trends, alongside introducing completely new ones. In this blog, we will talk about some of the emerging data and analytics trends in insurance, as well as how the lakehouse paradigm helps organizations adapt to them. The focus will be on the quantifiable value associated with analytics capabilities derived from business value of use cases as well as reduction in Total Cost of Ownership (TCO).

Auto insurance

With the government mandated lockdown in effect, drivers started driving much less frequently during the pandemic.This was accompanied by a trend of remote work, allowing people to move out of big cities, thereby changing the geographical distribution of drivers.

In addition to consumer behavior changes, the auto insurance industry was slowly moving towards a pattern of mileage-based pricing and along a path of legacy systems modernization, which was only accelerated by the pandemic.

Some of the emerging use cases/patterns are:

  • Calling emergency services, determining fault based on driver speed and trajectory. The latest crash detection feature from Apple is a great example of how speed and GPS coordinates captured via smartphones can be used in novel ways.
  • Anomalous driving based on outlier detection (factoring in weather/traffic). For example, a speed of 55 MPH may be considered normal on a freeway in normal weather conditions, but not during a snowstorm. A similar analogy could apply for heavy traffic, wherein anomalous driving can be detected by comparing direction and speed compared to surrounding traffic.
  • Factor in daily driving patterns to determine customized underwriting for customers. While the use of telematics to capture driver driving patterns(via smartphones or pluggable devices) has been around for a while, the supply chain issues caused by the current high inflation environment had a significant impact on this trend. As a result of these issues, the price of cars (both new and used) shot up. This, accompanied by the rampant inflation seen in the last 10 months, meant that auto repairs for cars involved in a collision are also significantly more expensive now. As a result, auto insurers are hard-pressed to triage out drivers who are most likely to get into a crash, and reward the drivers at the opposite end of the spectrum.

So what does this mean from a data and analytics perspective? It all boils down to streaming data– auto insurers need to be able to integrate new data sources (such as weather and traffic), build solutions capable of real-time processing (for alerting emergency services), and have a better understanding of drivers' driving patterns (to enable sophisticated ML based risk, underwriting and claim models).

The Lakehouse paradigm allows auto insurers to address these challenges, by:

  • Eliminating data redundancy, and ensuring the lake is the single source of truth
  • Enabling seamless integration of new data sources
  • Enabling both batch and real-time streaming based pipelines
  • Enabling complex ML models to run directly on the data lake

This yields lower loss ratios (the total incurred losses in relation to the total collected insurance premiums) due to more accurate pricing, better risk selection, loss control and prevention. In addition, insurers have lower TCO due to a more efficient way of ingesting streaming data and incorporating external data sources.

Commercial insurance

Commercial insurance includes property, general liability, cyber insurance and business income insurance among others. Commercial insurance companies use a variety of actuarial models for underwriting (pricing) policies for their customers. These actuarial models take into account various factors such as the industry, the location (for property), the weather and environmental conditions (think Florida premiums being higher due to hurricanes), etc.

Most of these actuarial models represent rules that are pretty static and have existed for some time. Even prior to the Covid-19 pandemic, the industry was moving towards automation of these actuarial models and leveraging more Machine Learning (ML) for underwriting, claims forecasting, etc. The pandemic further accelerated this trend, to the point where manual actuarial modeling is becoming redundant across the industry.

Another key trend that has emerged in the last few years is around IoT-based alerting for sensitive/valuable commodities. For example:

  • Vaccines (such as the Covid-19 vaccine) and other medicines/pharmaceutical compounds that need to be stored/transported within a specific temperature range. IoT sensors can be used to monitor the temperature in real-time, and alert the right team/person if the temperature goes outside the acceptable thresholds. Usually, if the problem can be fixed within a few minutes, the vaccines/medicines can be saved/considered viable.
  • Wines and other expensive goods (think paintings, jewelry, etc.) also need to be stored within specific temperature and humidity ranges. IoT sensors for both temperature and humidity can be used in the same manner described above to avoid any damage to these items.
  • In addition to IoT sensors, cameras and motion sensors can be used to determine liability for damage/loss, alert the right people/team, and thus prevent theft/loss, etc.

So how does the lakehouse paradigm fit in? Adoption of the Lakehouse paradigm allows insurers to:

  • Automate the mostly static actuarial models
  • Fast integration and automated ingestion/ETL of real time data sources described above, such as IoT sensors, audio and video feeds from cameras, motion sensors, etc.

This yields lower combined ratios (the incurred losses and expenses in relation to the total collected premiums) as a result of the large infrastructure benefits since processing IoT data is more scalable leveraging cloud computing (automatically upscale and downscale resources as needed). It is also worth nothing that these strategies also apply to Personal/Homeowners insurance to determine liability for damage/loss of jewelry, paintings and other expensive property.

Life insurance

Life insurance was one of the sub-verticals of the insurance industry that was heavily impacted by the Covid-19 pandemic since it is closely tied with healthcare.
Due to the mandatory lockdowns in many parts of the world, as well as social distancing norms enforced after, in-person interactions dropped significantly. As a result, the trend of quoting and buying insurance policies online (as opposed to buying them from insurance agents) accelerated further during the pandemic.
The industry in general has been moving towards more customized underwriting and pricing, based on the policyholders' current health, lifestyle, eating habits, etc. The pandemic highlighted the fact that people with existing immunological conditions, less active/unhealthy lifestyles & eating habits are more prone to serious health issues and/or hospitalization due to diseases. This further accelerated the trend of detailed data collection around customers' lifestyles as well as customized underwriting in the life insurance industry.

So how does the Lakehouse paradigm align with these new trends in the life insurance industry? Let's expand the impact of the use cases outlined above on the data and analytics landscape at these insurers, as well as how the lakehouse ties in:

  • With rising online sales of life insurance policies, the consumers' online profile and activity become even more important for the insurance companies. This includes clickstream data, spending habits, frequented websites, etc. The lakehouse enables you to integrate and ingest data from multiple, unstructured, real time data sources seamlessly, thereby reducing complexity and time to insights.
  • One of the key new sources for consumers' health data, is wearables like smartwatches (think Apple watch, Galaxy Watch, etc.) and fitness trackers (such as Fitbit). These new sources can be integrated seamlessly using the lakehouse architecture, and the data can be stream ingested, in real-time.
  • Complex ML models can be used directly on the data in the lakehouse to build a profile of the customers' lifestyle, and more importantly, detect changes to it. This in turn can lead to a better customer 360 solution, and a deeper understanding of the customer's lifestyle, thereby leading to a better, more tailored experience for the consumer.

This helps companies balance profitability and growth by gaining market share for customer segments that are within the insurer's target risk profile, and providing tailored recommendations of riders (amendments to policies) and other policy features.


Reinsurance is insurance for insurers, also known as stop-loss insurance. One of the key things reinsurers have to do is import policy and other documents they receive from their customers (insurers) on a large scale, and integrate them into their data landscape. This means they need technologies like Optical Character Recognition (OCR), Natural Language Processing (NLP), and machine learning (ML) models to ingest, transform and analyze these documents, in addition to more traditional forms of data.
This is another area that the lakehouse architecture is very well suited for–it allows you to ingest data from multiple, diverse sources into a single platform, where you can execute data engineering, data science and business intelligence workflows on it without having to create redundant copies or move the data to separate stores before the workflows can be run.

For more details on the Lakehouse architecture, please feel free to check out this blog by the Databricks founders. You can also save hours of discovery, design, development and testing with Databricks solution accelerators (filter by "industry">financial services) here to help you get started.

Try Databricks for free

Related posts

See all Industries posts