Skip to main content
Industries header

This post was written in collaboration with Jason Labonte, Chief Executive Officer, Veritas Data Research


In the realm of healthcare and life sciences, data stands as the linchpin for propelling medical breakthroughs and improving patient outcomes. Utilizing the right real-world data source can be a catalyst for innovation across healthcare, research, and pharmaceutical organizations. According to Gartner, leaders in data and analytics who engage in external data sharing can generate three times more measurable economic benefits compared to those who do not.

The Vital Role of Mortality Data

Mortality data is a critical cornerstone in health analytics, offering profound insights into treatment efficacy, public health policy, and protocol design. Yet, capturing these crucial endpoints is a challenge within conventional clinical datasets like insurance claims or electronic health records. This gap necessitates augmenting clinical real-world data (RWD) with a mortality dataset to accurately understand patient outcomes.

Veritas: Pioneering Quality Mortality Data Solutions

Veritas is resolving the scarcity of reliable mortality data. Founded by industry experts, Veritas employs cutting-edge technology and streamlined workflows to aggregate, curate, and disseminate foundational reference datasets. The process involves meticulous data ingestion from diverse sources, refinement using third-party reference data, and the creation of a comprehensive Fact of Death index.

Datavant Streamlines Insight Generation via Databricks

Enter Datavant, a key player in reducing data sharing hurdles in healthcare through privacy-centric technology that enables the linkage of patient health records across datasets. Their collaboration with Databricks stands as a testament to advancing seamless data sharing in the healthcare industry. Veritas leverages the Datavant technology to tokenize and de-identify their data to be shared with research, life sciences, insurance, and analytics organizations looking to better understand patient outcomes.

Datavant's Innovation on the Databricks Platform

Datavant introduced its Tokenization Engine tailored explicitly for the Databricks Platform, eliminating the need for custom deployments or maintenance. This library, designed for Databricks workspace, harnesses the power of Spark technology for enhanced performance. Notably, it supports direct reading and writing to locations in lakehouse, streamlining data pipelines for efficient token generation.

Accelerated Efficiency: Veritas' Journey with Datavant on Databricks

The integration with Datavant on Databricks proved transformative for Veritas, simplifying implementation, reducing processing times, and reducing costs.

Implementing the Datavant on Databricks was a simple installation of a python wheel. This process required less effort to set up data pipelines and was running within 1 day!

Previously, Veritas executed downloading, tokenization, and transformation in about 20 hours for 360 million patient records. Leveraging Datavant on Databricks and the power of Databricks' Spark technology, Veritas witnessed an astounding 4x time savings. They accomplished the tokenization of 360 million records in just 3 hours, followed by transformations in 2 hours, and did not require downloading. Over the course of a year this would be a savings of ~600+ hours of people and processing time!

Additionally, Datavant on Databricks reduced the time spent by the Veritas engineering team. The prior implementation of Datavant required hours of employee time to ensure proper execution of the product including downloading, resizing of a virtual machine, and an operator to actually run the on premise product (CLI). Veritas now manages this process in a single job which runs the Datavant on Databricks product only when new records are present. This saves 45% of an FTE's time to tokenize and transform Veritas' cause of death data.

The Datavant on Databricks product limits data movement with tokenization happening within Vertias' Databricks Workspace. The Datavant on Databricks workload was 1/4 the cost of running Datavant via virtual machines.

Veritas leveraging the partnership between Datavant and Databricks signifies a shift in the speed-to-insight, which will ultimately drive innovation and transformative advancements in the realm of life sciences and healthcare.

To delve deeper into these pioneering solutions and their impact on revolutionizing life sciences data sharing, check out the following resources:

Try Databricks for free

Related posts

Industries category icon 1

How Datavant and Databricks are Transforming Life Sciences with Data Sharing

Data is the lifeblood of healthcare Data is at the center of advancing medical breakthroughs and improving patient outcomes across healthcare and life...
Data AI

New Research Report: Unlocking the Value of Real-world Evidence

September 16, 2022 by Michael Sanky in Data Strategy
In the Healthcare and Life Sciences industry, real-world evidence (RWE) refers to medical evidence generated from data collected outside of a clinical trial...
Platform blog

Accelerate Your AI Journey with Pre-built Industry Solutions on Databricks Marketplace

Every organization is seeking to gain value from data—whether internally or externally from third-party data acquired from data marketplaces. Organizations across industries can...
See all Industries posts