In order to understand and react to their security situation, many cybersecurity operations use Security information and event management (SIEM) software nowadays. Using a traditional SIEM in a large company such as HP Enterprise is a challenge due to the increasing volume and rate of data. We present the solution used to reduce data volume processed by the SIEM using Spark Streaming and the results obtained in processing one of the largest data feeds in HPE: Firewall logs. Testing of SIEM rules the traditional way is a time-consuming process. Usually, it is necessary to wait one day to get results and statistic for one-day production data. An alternative approach to build a SIEM using Spark and other big data technologies will be drafted and results of “fast forward” processing of production data snapshots will be presented. HPE is the target of sophisticated well-crafted attacks and deployed cyber Security tools are not able to detect all of them. A simple application, built using Spark MLlib and company-specific data for training, for detection of malicious trending domains will be described. Takeaways: Spark streaming can be used to pre-process cybersecurity data and reduce their amount for further processing. Spark MLlib can be used to add the additional detecting capability for specific use cases.
In this presentation, we will share how Hewlett Packard Enterprise has implemented Apache Spark to deal with three main cyber security use cases:
Hewlett Packard Enterprise
Josef has been working for four years in Hewlett Packard Enterprise in Cyber Security Big Data team. He designed and built several distributed systems for processing vast amount of various data used for Cyber Security purposes. Before joining HPE, Josef participated in software development projects in several big international companies (Amdocs, Dun & Bradstreet, Accenture), small ones and academia. During his more than twenty years long career in IT, he acquired wide range of knowledge and experience that includes networking, operation systems, software development, distributed computing and machine learning. He likes kayaking, fly fishing and photography.