Petabyte-Scale On-Chain Insights: Real-Time Intelligence for the Next-Gen Financial Backbone
Overview
Experience | In Person |
---|---|
Type | Lightning Talk |
Track | Data Lakehouse Architecture and Implementation |
Industry | Enterprise Technology, Financial Services |
Technologies | Apache Spark, Delta Lake, Unity Catalog |
Skill Level | Beginner |
Duration | 20 min |
We’ll explore how CipherOwl Inc. constructed a near real-time, multi-chain data lakehouse to power anti-money laundering (AML) monitoring at a petabyte scale. We will walk through the end-to-end architecture, which integrates cutting-edge open-source technologies and AI-driven analytics to handle massive on-chain data volumes seamlessly. Off-chain intelligence complements this to meet rigorous AML requirements.
At the core of our solution is ChainStorage, an OSS started by Coinbase that provides robust blockchain data ingestion and block-level serving. We enhanced it with Apache Spark™ and Arrow™, coupled for high-throughput processing and efficient data serialization, backed by Delta Lake and Kafka. For the serving layer, we employ StarRocks to deliver lightning-fast SQL analytics over vast datasets. Finally, our system incorporates machine learning and AI agents for continuous data curation and near real-time insights, which are crucial for tackling on-chain AML challenges.
Session Speakers
Leo Liang
/Founder
CipherOwl Inc