Putting patients’ health first with data and AI
Improvement in data processing
Million records ingested in 20 minutes
SOLUTION: Clinical health data lake
PLATFORM USE CASE: Delta Lake, data science, machine learning, ETL
“Databricks delivered the time to market as well as the analytics and operational uplift that we needed in order to be able to meet the new demands of the healthcare sector.”
— Peter James, Chief Architect, Health Direct Australia
As the shepherds of the National Health Services Directory (NHSD), Healthdirect is focused on leveraging terabytes of data covering time-driven activity-based healthcare transactions to improve health care services, offerings, and support. With governance requirements, team siloes, and a legacy system that was difficult to scale, they moved to Databricks, boosting data processing for downstream machine learning while improving data security to meet HIPAA requirements.
Data quality and governance, silos, and the inability to scale
Due to regulatory pressures, Healthdirect Australia set forth to improve overall data quality and ensure a level of governance on top of that. But they ran into challenges when it came to data storage and access. Multiple data silos also served as a blocker to efficiently prepare data for downstream analytics. These disjointed data sources impacted the consistency of data reads as data was oftentimes out-of-sync between the various systems in their stack. The low-quality data also led to higher error rates and processing inefficiencies. This fragmented architecture created significant operational overhead and limited their ability to have a comprehensive view of the patient.
Further, they needed to ingest over 1 billion data points due to a changing landscape of customer demand such as bookings, appointments, pricing, eHealth transaction activity, etc. — estimated at over 1TB of data.
“We had a lot of data challenges. We just couldn’t process efficiently enough. We were starting to get batch overruns. We were starting to see that a 24-hour window isn’t the most optimum time in which we want to be able to deliver healthcare data and services,” explained Peter James, Chief Architect, Health Direct Australia.
Ultimately, Healthdirect realized they needed to modernize their end-to-end process and tech stack to properly support the business.
Modernizing analytics with Databricks and Delta Lake
Databricks provides Healthdirect Australia with a unified data analytics platform that simplifies data engineering and accelerates data science innovation. The notebook environment enables them to make content changes in a controlled fashion rather than having to run bespoke jobs each time.
“Databricks has provided a big uplift for our teams and our data operations,” said James. “The analysts were working directly with the data operations teams. They are able to achieve the same pieces of work together within the same timeframes that used to take twice as long. They’re working together and we’re seeing just a massive acceleration in the speed at which we can deliver service.”
With Delta Lake, they’ve created logical data zones: Landing, Raw, Staging, and Gold. Within these zones, they store their data “as-is”, in their structured or unstructured state, in Delta Lake Tables. From there they use a metadata-driven schema and hold the data within a nested structure within that table. What this allows them to do is handle data consistently from every source and simplifies the mapping of data to the various applications pulling the data.
Meanwhile, through Structure Streaming, they were able to convert all of their ETL batch jobs into streaming ETL jobs that could serve multiple applications consistently. Overall, the advent of Spark Structured Streaming, Delta Lake, and Databricks’ unified data analytics platform provide significant architecture improvements that have boosted performance, reduced operational overheads, and increased process efficiencies.
Faster data pipelines results in better patient-driven healthcare
As a result of the performance gains delivered by Databricks and the improved data reliability through Delta Lake, Healthdirect Australia realized improved accuracy of their fuzzy name match algorithm from less than 80% with manual verification to 95% and no manual intervention.
The processing improvements with Delta Lake and Structured Streaming allowed them to process more than 30,000 automated updates per month. Prior to Databricks, they had to use unreliable batch jobs that were highly manual to process the same number of updates over a span of 6 months — a 6X improvement in data processing.
They were also able to increase their data load rate to 1 million records per minute, loading their entire 20 million record data set in 20 minutes. Before the adoption of Databricks this used to take more than 24 hours to process the same 1 million transactions, blocking analysts from making swift decisions to drive results.
Last, data security which was critical due to compliance requirements, was greatly improved. Databricks provides standard security accreditations like HIPAA. Healthdirect was able to use Databricks to meet Australia’s security requirements. This yielded significant cost reductions and gave them continuous data assurance by monitoring changes to access privileges like changes in roles, meta-data level security changes, data leakage, etc.
“Databricks delivered the time to market as well as the analytics and operational uplift that we needed in order to be able to meet the new demands of the healthcare sector,” said James.
Looking ahead, the future looks bright for Healthdirect Australia. With the help of Databricks, they have proven the value of data and analytics and how it can impact their business vision. With transparent access to data that boasts well-documented lineage and quality, participation across various business and analyst groups has increased — empowering teams to more easily and quickly extract value from their data with the goal of improving healthcare for everyone.