Skip to main content
CUSTOMER STORY

Securing growth through scalable and efficient data management at Cash App

Cash App uses Delta Lake on Databricks to securely and reliably handle large volumes of data

CLOUD: AWS

Cash App is focused on redefining the world’s relationship with money by making it more relatable, instantly available and universally accessible. They utilize data to enhance the consumer experience and support internal departments. With Databricks Data Intelligence Platform, Cash App leveraged Delta Lake to deliver a more flexible and consistent data infrastructure for Afterpay data. Delta Lake’s reliability and stability surpassed their previous setup, and the platform’s scalability allowed Cash App to handle increasing data volumes. Cash App has improved overall pipeline stability, enabled real-time data processing and enhanced decision-making across the organization — all while managing costs.

Redundancies and high costs hinder accessibility in financial services

Cash App, a mobile payment service under the umbrella of Block, aims to revolutionize the financial landscape by enabling individuals to send and receive money without the necessity of a traditional bank account. Their mission is to democratize financial services, making transactions and financial management accessible to everyone — especially those underserved by conventional banking systems.

According to Unnee Udayakumar, Senior Manager of Data Engineering at Cash App’s Machine Learning & Data Science Organization, “We are focused on equipping data analysts and scientists with the tools and data they need. By evaluating the business value of our datasets, we can serve our stakeholders better, feeding insights back into the product and various departments to enhance decision-making processes.”

Prior to adopting the Databricks Data Intelligence Platform, Afterpay data was ingested into the Block ecosystem via Snowflake, while Redshift and EMR were involved as legacy data transformation agents. “Originally from Afterpay, we were not just a team of data engineers plumbing data — we also managed the entire data tech stack, including compute, storage and security aspects. With Block acquisition, there was a sudden demand for scale, while preserving data’s integrity and timeliness as well as the platform’s availability and reliability,” Udayakumar added.

Before being acquired by Block, Cash App’s Afterpay data lake was built using AWS; specifically, Redshift, EMR, Glue and S3. “With the Block acquisition, our challenges were multifold. We had an S3-based data lake of parquet files organized as Redshift/Glue tables, but the columnar parquet tables still demanded database persistence on Redshift for us to deliver ACID guarantees to end users. This presented us with scalability and cost issues. We needed to figure out how to deliver Afterpay data to the highly demanding Block data ecosystem, with improved reliability, most optimally, without runaway costs,” Udayakumar said. Following the acquisition, there was a strategic shift toward delivering interoperable, format-agnostic representation of data with ACID guarantees. This meant moving away from proprietary solutions. “We didn’t want to be tied to specific engines,” Udayakumar explained. “We needed to ensure a more open and adaptable data management framework where everyone could simply get the data they needed.”

A flexible, consistent and open data infrastructure with Delta Lake

The need to democratize access and reduce redundancies became more evident after the acquisition of Afterpay by Block. Cash App sought a robust platform that could handle the growing complexities of their data requirements while ensuring scalability and cost efficiency. This led Cash App to leverage Delta Lake, Databricks’ optimized data storage layer, to manage Afterpay data more effectively.

The team migrated all extract, transform, load (ETL) and compute processes away from Redshift and EMR to Databricks. Data is primarily produced as Delta Lake and presented to end users via catalogs in Databricks Unity Catalog and AWS Glue, and databases in Snowflake for platform-agnostic use cases, thus offering interoperability for BI, data science and engineering users. This unified approach also ensures consistent data engineering and processing across the platform. The integration of Delta Lake with other tools has been smooth, allowing teams to leverage their existing investments while enhancing their data capabilities. “We have been able to integrate Delta Lake seamlessly with Snowflake for analytical purposes and AWS for storage, providing a holistic and flexible data management solution,” Udayakumar explained.

Delta Lake can handle large volumes of data with high reliability, as the Databricks Platform ensures that data pipelines are consistent and dependable. Udayakumar explained, “The reliability and stability we’ve achieved with Delta Lake far surpass our previous setup. This has significantly reduced the amount of time and resources we need to manage our data, as well as the risk of data loss or corruption.”

With Delta Lake, Udayakumar’s team achieved their desire for an agnostic representation of their data — one unified view that maintains the openness of data. Implementing the medallion architecture (bronze, silver, gold) has standardized data handling for various use cases. Delta Lake’s open format allows end users to consume data from Databricks, or Snowflake, or use any preferred compute option to read Delta directly from S3.

By converging and standardizing the Afterpay data lake as S3 and Delta, the team has set an adaptable standard for an agnostic, ACID-compliant data representation across the Block ecosystem.

For Udayakumar, one of the most compelling features of Delta Lake is Delta Live Tables (DLT), which data engineers leverage to read streaming data from Kafka. This capability enhances the speed of data processing and ensures that Cash App can deliver timely insights to consumers and internal teams. DLT also allows out-of-the-box data de-duplication and quality checks that come super-handy compared with their previous EMR setup. Costwise, DLT brings the advantage of enabling truly incremental pipelines that avoids multiple scans of the same source data.

Udayakumar’s team has also worked on a Databricks-based framework that enables pipelines to store raw datasets into separate sensitivity tiers, enabling separate data access standards for end users and applications. Classification achieved in advance at the storage tier of the data lake plays a key role in safeguarding data and enabling role-based access control at the root level.

Providing financial insights to those who need it, fast

The performance improvements achieved with Databricks are described as “tremendous.” The scalable nature of the platform has allowed Cash App to handle increasing data volumes without compromising performance. This scalability is crucial as Cash App continues to grow and expand their user base. “With Delta Lake, we achieve a unified view of our data that is both reliable and scalable,” Udayakumar said. “Using the Databricks ecosystem has enabled us to set up cost-effective data pipelines with enhanced data quality standards that improved trust in a commercially scalable way.”

While Cash App has yet to measure the exact operational cost savings, Udayakumar believes that the benefits are clear — the scalable infrastructure of Databricks and the elimination of redundant, manual processes (such as schema evolution handling) will lead to reductions in recompute and engineering time spent on fixes. “Since Block acquisition, the Afterpay data lake required a cross-region setup. The new incremental Databricks pipelines have helped us prevent cost surges in cross-region transfer. The licensing model of Databricks, combined with the openness of Delta Lake, allows us to scale efficiently while keeping our costs in check,” Udayakumar noted.

Looking ahead, the team is excited to simplify data engineering by taking advantage of Delta Lake UniForm, which addresses their interoperability needs more elegantly by automatically generating Iceberg metadata asynchronously, allowing Iceberg clients to read Delta tables as if they were Iceberg tables. “We’re excited about Delta Lake UniForm for cross-format interoperability, to further simplify data management across our teams,” Udayakumar expressed.

The ability to provide real-time insights to various LOBs has improved decision-making across the organization. Insights derived from data are now readily available to departments such as Risk, Finance and Marketing, enabling data-driven strategies and operations. In turn, these data points are fed as intelligence to consumers to help them make better decisions about their money. By leveraging the full potential of the Databricks Platform, Cash App is well positioned to drive innovation and achieve their mission of democratizing financial services. As Udayakumar put it, “The openness and transparency we have experienced while working with Databricks and Delta Lake are the most important qualities. The Databricks team has been critical in helping us throughout the journey, and we are most impressed with their support.”

Block’s AI drive has a keen focus on ensuring accurate, complete and timely access to both raw and modeled data via its data foundation. Leaders like Jackie Brosamer have identified Databricks as a key catalyst for achieving this goal. This vision is now reflected in Afterpay’s data ecosystem, enhancing their data-driven decision-making through Databricks, propelling their way forth with innovation in the fintech space while remaining committed to open source principles.