Achieving fraud detection at scale
"With Databricks, we have one cohesive end-to-end process with one single unified team working on protecting the securities markets."
The Financial Industry Regulatory Authority (FINRA) is an independent, non-governmental organization responsible for protecting investors by ensuring that the U.S. securities markets operate in a fair and honest manner. FINRA oversees 12 markets and exchanges, 3700 firms and more than 600,000 brokers. FINRA deters misconduct by enforcing rules, detecting and preventing wrongdoing in the U.S. markets, and disciplining members who break the rules.
To protect investors, FINRA is involved in identifying fraud schemes in the market. It does so by surveilling 99 percent of the equity markets and approximately 70 percent of the options markets. To do this, FINRA captures all transaction data—more than 100 billion trading events per day—from the various securities markets. FINRA then uses machine-learning algorithms to identify fraudulent patterns of behavior for investigation. Unfortunately, their legacy architect created a number of challenges that prevented them from effectively and efficiently monitoring the security markets. Challenges included:
- Data was stored in disparate on-premise systems which were highly complex and costly to build and scale leading to brittle data pipelines
- Disjointed development and production systems required data engineers to convert Python-based machine learning models into complex SQL statements that often times totaled 60-70 pages of code
- Complex model development process made it challenging to debug and iterate on models and limited reuse of code across teams
- Long development cycles due to segmented data scientist and engineering teams
Databricks provides FINRA with a Unified Data Analytics Platform that democratizes data and brings previously siloed teams together, cutting down overall time to market, increasing reusability of feature libraries, and improving operational efficiency. With Databricks, teams can quickly iterate on ML models and scale detection efforts to 100’s of billions of market events per day. As a result, FINRA has significantly improved fraud prevention leading to a more secure financial future for investors in the US.
- Unified Data Analytics Platform that includes Infrastructure management, Databricks Runtime, and interactive workspace streamlines the development process for their machine learning models while reducing infrastructure costs
- Interactive workspace enables their data science to overcome silos to iterate faster and collaborate better using their code of choice (SQL, R or Python) all within the same environment
- Fully managed cloud service allows their team to focus on higher-level issues related to the domain of machine learning rather than DevOps work