Grace Tang is a Senior Staff Machine Learning Engineer on the Anti-Abuse AI Team at LinkedIn. She works across abuse domains, focusing on integrating detection systems together to achieve defense in depth. Grace received her Ph.D. in Bioengineering from Stanford University.
June 23, 2020 05:00 PM PT
Detection of abusive activity on a large social network is an adversarial challenge with quickly evolving behavior patterns and imperfect ground truth labels. Additionally, there is often negligible signal from an individual fake account before it is used for abuse. These characteristics limit the use of supervised learning techniques, but they can be overcome using unsupervised methods. To address these challenges, we created a Scala/Spark implementation of the isolation forest unsupervised outlier detection algorithm using the Spark ML API; we recently open sourced this library (https://github.com/linkedin/isolation-forest). We also developed a Scala/Spark unsupervised fuzzy clustering library to identify groups of abusive accounts with similar behavior. Key takeaways: In this talk, we describe the details of these unsupervised algorithms, why this type of machine learning is well suited to the specific challenges associated with fighting abuse, and how we use these models to detect accounts (fake, compromised, or real) that are engaging in abusive behavior such as automation and fake engagement at LinkedIn. We will also discuss best practices and lessons learned from our experience using these techniques in production.