Scaling ML-Based Threat Detection For Production Cyber Attacks

Download Slides

Vulnerabilities such as Spectre and Meltdown continue to plague many production servers, based on Intel CPUs. Our solution involves software-based monitoring of hardware counters and sending that data to Apache Spark clusters for threat detection. We leverage Spark’s support for support vector machine (SVM) inference. Our machine learning models are trained off-line by a data scientist within a Jupyter notebook environment. As new models are validated, they can be easily deployed to the Spark cluster from the notebook. We have standardized model export and import using the ONNX machine learning open file format.

In our presentation, we will demo the full pipeline, from model training to deployment. We will discuss the various challenges when deploying ML-based cyber-threat detection at scale using Apache Spark. For example, we found that gaps in detection can occur when Spark models are updated. We will describe a novel data ingestion architecture, based on Apache Kafka, that we developed to deal with this issue.


Try Databricks
See More Spark + AI Summit in San Francisco 2019 Videos

« back
About George Williams

George is Director of Computing and Data Science at GSI Technology, an embedded hardware and artificial intelligence company. He's held senior leadership roles in software, hardware, data science, and research, including tenures at Apple's New Product Architecture group and at New York University's Courant Institute. He can talk on a broad range of topics at the intersection of e-commerce, machine learning, software engineering, and cloud security. He is an author on several research papers in computer vision and deep learning, published at NIPS, CVPR, ICASSP, and SIGGRAPH.