HomepageData + AI Summit 2022 Logo
Watch on demand

Cutting the Edge in Fighting Cybercrime: Reverse-Engineering a Search Language to Cross-Compile it to PySpark

On Demand

Type

  • Session

Format

  • Hybrid

Track

  • Data Lakes, Data Warehouses and Data Lakehouses

Industry

  • Financial Services

Difficulty

  • Intermediate

Room

  • Moscone South | Upper Mezzanine | 159

Duration

  • 35 min
Download session slides

Overview

Traditional cybersecurity Security Information and Event Management (SIEM) ways do not scale well for data sources with 30TiB per day, leading HSBC to create a Cybersecurity Lakehouse with Delta and Spark. Creating a platform to overcome several conventional technical constraints, the limitation in the amount of data for long-term analytics available in traditional platforms and query languages being difficult to scale and time-consuming to run. The situation in cybersecurity is that not a lot of analysts have a deep understanding of Apache Spark.

In this talk we’ll learn how to implement (or actually reverse-engineer) a language with Scala and translate it into what Apache Spark understands, the Catalyst engine. We’ll guide you through the technical journey - including examples of Databricks Notebooks and code blocks - of building equivalents of a query language into Spark and how to implement another search query language features that are not possible out of the box, like IP CIDR matching or fuzzy matching across all columns. We’ll show you how to use the same framework for PySpark code generation and use-case reconciliation.

We’ll learn how HSBC business benefited from this cutting-edge innovation, like decreasing time and resources for Cyber data processing migration, improving Cyber threat Incident Response (IR), and fast onboarding of HSBC Cyber Analysts on Spark with Cybersecurity Lakehouse platform.

Session Speakers

Abigail Shriver

Lead Cybersecurity Software Engineer

HSBC

Jude Ken-Kwofie

Principal Software Engineer

HSBC

Serge Smertin

Databricks

See the best of Data+AI Summit

Watch on demand