Abnormal Security’s behavior AI analyzes over 50,000 signals to detect and remediate all email attacks. Knowing that their legacy Hadoop system wasn’t going to allow them to scale efficiently with the rapidly increasing demand for their services, Abnormal Security migrated to the Databricks Lakehouse Platform and adopted Databricks SQL to power their engineering dashboards. Today, Abnormal Security is able to maintain a data-first culture, manage their data infrastructure with ease, process thousands of emails per second and deploy machine learning models that detect the smallest of anomalies that signal suspicious behavior, to protect their customers from email-based attacks.
As the increase in major email phishing and ransomware attacks continues to make global headlines, protecting users from this expanding spectrum of targeted email attacks is fast becoming a top priority across industries. In 2017, Abnormal Security entered the stage with a specific mission: to provide companies with a level of email security that could rise to the increasingly complex challenges of modern cyberattacks. Their solution was an immediate hit, and while demand was great for business, Abnormal saw the need to quickly and significantly scale their systems and processes to continue providing the best customer experience.
“Scale was a critical focus area for us,” explained Sanny Liao, the head of data science at Abnormal Security. “As we keep ahead of ever-evolving attack strategies with frequent product releases, and serve the needs of a rapidly growing customer base made up of some of the largest companies in the world, we are now able to constantly refine and scale our models to meet our needs.”
Abnormal Security also had to ensure their data analytics infrastructure was robust enough to meet their scale requirements. With a legacy Hadoop system in place, they were bogged down with spending resources maintaining AWS EMR clusters rather than identifying malicious attacks for their customers. And that became a bigger roadblock to success as the amount of data to be ingested continued to grow.
“We were spending too much time managing our Spark infrastructure,” added Carlos Gasperi, a software engineer at Abnormal Security. “What we needed to be doing with that time was build the pipelines that would make the product better.”
They ended up finding that time to fight cyberattacks once they implemented the Databricks Lakehouse Platform, which simplified their data architecture while maximizing the performance of data pipelines and analytics via Delta Lake and Databricks SQL.
“Once we implemented Databricks on AWS, the need to manage long-running AWS EMR clusters went away entirely,” said Gasperi. “Databricks lets us schedule one cluster for every job we run, which meant we didn’t need to solve so many of the problems we had before around compute resource management because they simply vanished.”
With infrastructure no longer a challenge, they are now able to ingest data directly from S3 and query it in near real-time with the help of Delta Lake, an open format storage layer that delivers reliability, security and performance on your data lake — for both streaming and batch operations. Data flows from Kinesis Firehose into Delta Lake, making threat signals data instantly available to data scientists. With Databricks SQL, data scientists are then able to create visualizations using rich dashboards to drive product decisions and improve detection efficacy.
In addition to making their data actionable, Databricks also provided the collaborative environment Abnormal Security needed to increase productivity. “We had a legacy Jupyter notebook that was one big beefy server shared across a bunch of people,” explained Carlos. “Scaling at that point meant adding a second one, but because there was such a wide variety of projects in this single notebook, running heavy applications while working within the notebook actually slowed the processing time for everyone else.”
Now, data analysts, data scientists and data engineers can work in the same space without constantly competing for compute resources, not to mention being able to collaborate on the data more easily, which has led to a significant increase in productivity.
The implementation of the Databricks Lakehouse Platform has provided Abnormal Security with a single platform that unifies all of their data for analysts and data scientists alike to work on solutions that help their customers prevent all forms of email attacks. Not only are infrastructure costs for individual pipelines down, but Abnormal Security was able to experience average productivity gains of 30+% for both the data science and data engineering teams.
Next up, Abnormal Security will use Databricks to focus on democratizing threat signals data usage across the organization in a privacy-first manner. “We want to make sure that every team — not just the data team — is empowered to use data to drive decisions,” added Sanny. “That’s what needs to happen to increase the quality and velocity of work across the company. Databricks is going to enable that.”
Databricks Lakehouse enables us to organize and leverage all our data at scale, to power analytics in an effort to detect and block all forms of email attacks for our customers, including targeted socially engineered email attacks that evade traditional defenses.”
– Sanny Liao, Head of Data Science at Abnormal Security