As a Senior Solutions Engineer with H2O.ai, Elena is passionate about helping customers solve advanced data science problems while maximizing business value. Coming from a diverse quantitative background, Elena earned her PhD from the University of Illinois at Urbana-Champaign with a dissertation focused on using machine learning to find patterns in accelerometer data and predict health outcomes. Previously, Elena worked with a variety of customer data science use cases while at Databricks, as well building machine learning models to identify manipulative activity in the US markets as the Lead Data Scientist at the Financial Industry Regulatory Authority (FINRA). In her spare time, Elena trains Brazilian Jiu Jitsu and dances classical ballet.
May 27, 2021 03:15 PM PT
Building ML models is a time consuming endeavor that requires a thorough understanding of feature engineering, selecting useful features, choosing an appropriate algorithm, and performing hyper-parameter tuning. Extensive experimentation is required to arrive at a robust and performant model. Additionally, keeping track of the models that have been developed and deployed may be complex. Solving these challenges is key for successfully implementing end-to-end ML pipelines at scale.
In this talk, we will present a seamless integration of automated machine learning within a Databricks notebook, thus providing a truly unified analytics lifecycle for data scientists and business users with improved speed and efficiency. Specifically, we will show an app that generates and executes a Databricks notebook to train an ML model with H2O’s Driverless AI automatically. The resulting model will be automatically tracked and managed with MLflow. Furthermore, we will show several deployment options to score new data on a Databricks cluster or with an external REST server, all within the app.
October 16, 2019 05:00 PM PT
Detecting fraudulent patterns at scale is a challenge given the massive amounts of data to sift through, the complexity of the constantly evolving techniques, and the very small number of actual examples of fraudulent behavior. In finance, added security concerns and the importance of explaining how fraudulent behavior was identified further increases the difficulty of the task. Legacy systems rely on rule-based detection that is difficult to implement and run at scale. The resulting code is very complex and brittle, making it difficult to update to keep up with new threats.
In this talk, we will go over how to convert a rule based financial fraud detection program to use machine learning on Spark as part of a scalable, modular solution. We will examine how to identify appropriate features and labels and how to create a feedback loop that will allow the model to evolve and improve overtime. We will also look at how MLflow may be leveraged throughout this effort for experiment tracking and model deployment.
Specifically, we will discuss:
-How to create a fraud-detection data pipeline
-How to leverage a framework for building features from large datasets
-How to create modular code to re-use and maintain new machine learning models
-How to choose appropriate models and algorithms for a given fraud-detection problem