HomepageData + AI Summit 2023 Logo
JUNE 26-29, 2023
Register Now

Cutting the Edge in Fighting Cybercrime: Reverse-Engineering a Search Language to Cross-Compile it to PySpark

On Demand


Traditional cybersecurity Security Information and Event Management (SIEM) ways do not scale well for data sources with 30TiB per day, leading HSBC to create a Cybersecurity Lakehouse with Delta and Spark. Creating a platform to overcome several conventional technical constraints, the limitation in the amount of data for long-term analytics available in traditional platforms and query languages being difficult to scale and time-consuming to run. The situation in cybersecurity is that not a lot of analysts have a deep understanding of Apache Spark.

In this talk we’ll learn how to implement (or actually reverse-engineer) a language with Scala and translate it into what Apache Spark understands, the Catalyst engine. We’ll guide you through the technical journey - including examples of Databricks Notebooks and code blocks - of building equivalents of a query language into Spark and how to implement another search query language features that are not possible out of the box, like IP CIDR matching or fuzzy matching across all columns. We’ll show you how to use the same framework for PySpark code generation and use-case reconciliation.

We’ll learn how HSBC business benefited from this cutting-edge innovation, like decreasing time and resources for Cyber data processing migration, improving Cyber threat Incident Response (IR), and fast onboarding of HSBC Cyber Analysts on Spark with Cybersecurity Lakehouse platform.


  • Session


  • Hybrid


  • Data Lakes, Data Warehouses and Data Lakehouses


  • Financial Services


  • Intermediate


  • Moscone South | Upper Mezzanine | 159


  • 35 min
Download session slides

Session Speakers

Headshot of Abigail Shriver

Abigail Shriver

Lead Cybersecurity Software Engineer


Headshot of Jude Ken-Kwofie

Jude Ken-Kwofie

Principal Software Engineer


Headshot of Serge Smertin

Serge Smertin


See the best of Data+AI Summit

Watch on demand