CUSTOMER
STORY

Expanding access to the economy

Block redefines financial services with Databricks

12X

Reduction in compute cost

20%

Reduction in data egress cost

12PB

Of data managed and governed

Watch video

Additional use cases

Block leverages Spark Declarative Pipelines to expand economic access for millions

Block is a global technology company that champions accessible financial services, prioritizing economic empowerment. Its subsidiaries, including Square, Cash App, Spiral, TBD and TIDAL, are committed to expanding economic access. By utilizing machine learning (ML) and artificial intelligence (AI), Block proactively identifies and prevents fraud, ensuring secure customer transactions. Moreover, Block enhances user experiences by delivering personalized recommendations, utilizing identity resolution to gain a comprehensive understanding of customer activities across their diverse services. Internally, Block optimizes operations through automation and predictive analytics, driving efficiency in financial service delivery. To bolster their capabilities, Block leverages the Databricks Data Intelligence Platform, consolidating and streamlining their data, AI and analytics workloads. This strategic move positions Block for the forthcoming automation-driven innovation shift and solidifies its position as a pioneer in AI-driven financial services, facilitating inclusive access to financial opportunities for economic growth.

Managing data volume and data silos slowed innovation

In pursuit of their data strategy to enhance time to market, Block embarked on an active migration of their data processing to the cloud. A significant obstacle they faced was the efficient management of a large volume of data crucial for graph-related use cases. This encompassed handling graph databases, leveraging various machine learning tools and optimizing performance for petabytes of data. Additionally, operational inefficiencies and scalability concerns arose from the fragmented nature of data across diverse business units. Cumbersome data transfers between these systems combined with the siloed nature of data governance policies further complicated matters, posing auditing and policy enforcement challenges.

To address these challenges and accelerate graph analysis, especially in Online Analytical Processing (OLAP) mode, Block chose to migrate to Spark and selected Databricks as their lakehouse. This decision allowed them to consolidate all data and AI workloads onto a unified platform, empowering data scientists, data engineers and AI practitioners to leverage data efficiently from a centralized location.

As Joseph Kesting, Software Engineer at Block, explained,“The adoption of Databricks as a centralized platform for storing and sharing data across business units has empowered Block to establish a thriving data marketplace. This unique setup enables individual business units to exert their own controls while benefiting from the conglomerate’s resources, granting them access to diverse data sets from different units.”

Currently, Block manages 12PB of data on the Databricks Data Intelligence Platform and anticipates reaching 16PB by year-end. Approximately 70 different teams across business units, such as TIDAL, Cash App, Square and TBD, and 500 active power users actively utilize the platform.

Unified governance accelerates collaboration

One of Block’s critical requirements was the proper implementation and uniformity of data governance policies, ensuring compliance with privacy laws like GDPR and CCPA for both customers and internal teams. The objective was to enable secure and compliant access to personally identifiable information (PII) data. To address these challenges, Block adopted Unity Catalog for centralized governance.

According to Kesting, “Introduction to Databricks coincided with the launch of Unity Catalog, eliminating the need for evaluating alternative data governance tools. The seamless integration with Databricks was the primary factor driving our choice of Unity Catalog.”

With Unity Catalog, Block achieved a unified view of their data estate across different business units, simplifying access permission management. It also offered the flexibility to distribute cost attribution among teams by allowing the assignment of storage locations per team for their catalogs and schemas. This approach enabled different business units to maintain their distributed data governance policies while ensuring a streamlined process.

“Unity Catalog played a pivotal role in facilitating secure and controlled access to sensitive PII data for diverse business units. It allowed data access restriction through dedicated workspaces, ensuring compliance with the original terms of service for data collection. This compliance was enforced not only for the business units that collected the data but also for other units accessing it,” says Kesting. Block plans to enhance this capability by implementing a clean room solution using Delta Sharing in Unity Catalog, enabling secure and privacy-safe collaboration across business units and the partner ecosystem.

Block also intends to leverage data lineage to comply with right-to-forget scenarios. This involves tracing the usage of PII data throughout the entire Block ecosystem, ensuring adherence to data privacy regulations.

Unlocking business value through cost reduction and efficiency

Migrating graph use cases to Databricks proved to be a game changer for Block, delivering substantial improvements in compute performance and cost optimization. By leveraging Databricks, Block managed to reduce compute costs by an impressive 12x while unlocking previously unattainable use cases due to scaling limitations. According to Kesting, “the elimination of these constraints with Databricks has opened up new possibilities for innovation and analysis.”

The implementation of Unity Catalog within Block’s data ecosystem brought about transformative benefits. It facilitated the creation of a dynamic “marketplace” for data exchange between different business units, fostering collaboration and knowledge sharing. This played a crucial role in reducing data egress costs associated with cross-cloud provider data transfer by 20%.

Unity Catalog also improves the ease of IAM policy management for Block. Previously, they had to navigate a complex two-step approval process, involving attaching IAM policies to roles and then to S3 buckets. This often led to bucket policy limitations and required permissions to be refactored. However, with Unity Catalog, they streamlined this process by configuring sub-group level access permissions in a single location. The operational efficiency of data sharing improved significantly, reducing the time required from days to seconds. Additionally, the adoption of Unity Catalog promoted the collection of data into S3 buckets, resulting in improved latency and the co-location of compute and storage.

Moreover, Unity Catalog empowered Block to attribute data ownership more easily and decentralize decision-making. Data sets could be associated with their respective owners, enabling them to determine how the data is shared. This shift from a centralized team imposing data governance to actual data owners making decisions improved compliance and audit reporting, enhancing overall data governance and accountability.

Looking ahead, there is a big focus on leveraging generative AI and LLMs in Block’s overall data and AI strategy, and Unity Catalog will play an important role in delivering on that strategy with the ability to govern ML models along with the data from single location will accelerate AI and analytics initiatives.