Lead Data Engineer at Avast and long time Scala developer with big focus on Big Data and Machine Learning pipelines.
November 18, 2020 04:00 PM PT
At Avast we complete over 17 million phishing detections a day, providing crucial online protection for this type of attacks.
In this talk Joao Da Silva and Yury Kasimov will present the MATS stack for productionisation of Machine Learning and their journey into integrating model tracking, storage, cross-system orchestration and model deployments for a complete and modern machine learning pipeline.
One can integrate MATS stack into their existing ecosystem without disruption, no need to migrate to clean AWS all of a sudden.
MATS stack consists of adopting MLFLow, Airflow, Tensorflow and Spark to form a cross-system orchestrated ML pipeline into a standard set of well integrated tools which data scientists at Avast can adopt.
They will use Angler, an internal machine learning project for detecting phishing URLs to demonstrate how MATS stack was leveraged for this ML Pipeline, walking the audience through all stages of the Angler pipeline: data transformations and enrichments in Spark, training of models, experiment tracking and serving of the models. The pipeline is useful for fast and reproducible experiments and it allows a fast progression from research to production.
Speakers: Joao Da Silva and Yury Kasimov
October 16, 2019 05:00 PM PT
At Avast, we believe everyone has the right to be safe. We are dedicated to creating a world that provides safety and privacy for all, not matter where you are, who you are, or how you connect. With over 1.5 billion attacks stopped and 30 million new executable files monthly, big data pipelines are crucial for the security of our customers. At Avast we are leveraging Apache Spark machine learning libraries and TensorflowOnSpark for a variety of tasks ranging from marketing and advertisement, through network security to malware detection.
This talk will cover our main cybersecurity usecases of Spark. After describing our cluster environment we will first demonstrate anomaly detection on time series of threats. Having thousands of types of attacks and malware, AI helps human analysts select and focus on most urgent or dire threats. We will walk through our setup for distributed training of deep neural networks with Tensorflow to deploying and monitoring of a streaming anomaly detection application with trained model.
Next we will show how we use Spark for analysis and clustering of malicious files and large scale experimentation to automatically process and handle changes in malware. In the end, we will give comparison to other tools we used for solving those problems.