Apple must detect a wide variety of security threats, and rises to the challenge using Apache Spark across a diverse pool of telemetry. This talk covers some of the home-grown solutions we've built to address complications of scale:
- Notebook-based testing CI - Previously we had a hybrid development model for Structured Streaming jobs wherein most code would be written and tested inside of notebooks, but unit tests required export of the notebook into a user's IDE along with JSON sample files to be executed by a local SparkSession. We've deployed a novel CI solution leveraging the Databricks Jobs API that executes the notebooks on a real cluster using sample files in DBFS. When coupled with our new test-generation library, we've seen 2/3 reduction in the amount of time required for testing and 85% less LoC.
- Self-Tuning Alerts - Apple has a team of security analysts triaging the alerts generated by our detection rules. They annotate them as either 'False Positive' or 'True Positive' following the results of their analysis. We've incorporated this feedback into our Structured Streaming pipeline, so the system automatically learns from consensus and adjusts future behavior. This helps us amplify the signal from the rest of the noise.
- Automated Investigations - There are some standard questions an analyst might ask when triaging an alert, like: what does this system usually do, where is it, and who uses it? Using ODBC and the Workspace API, we've been able to templatize many investigations and in some cases automate the entire process up to and including incident containment.
- DetectionKit - We've written a custom SDK to formalize the configuration and testing of jobs, including some interesting features such as modular pre/post processor transform functions, and a stream-compatible exclusion mechanism using foreach Batch.