Eugene Mandel

Head of Product, Superconductive

Eugene Mandel is Head of Product at Superconductive and a core contributor to the Great Expectations open source library. Prior to Superconductive, Eugene led data science at Directly, was a lead data engineer on the Jawbone data science team, and co-founded 3 startups that used data in diverse fields – internet telephony, marketing surveys and social media. Eugene’s core interest has been turning data into real products that make users happy.

Past sessions

Untested, undocumented assumptions about data in data pipelines create risk, waste time and erode trust in data products. Automated testing has been one of the biggest productivity boosters in modern software development and essential for managing complex codebases. Data science and engineering have been largely missing out on automated testing. This talk introduces Great Expectations, an open-source python framework for bringing data pipelines and products under test. Great Expectations is a python framework for bringing data pipelines and products under test. Like assertions in traditional python unit tests, Expectations provide a flexible, declarative language for describing expected behavior. Unlike traditional unit tests, Great Expectations applies Expectations to data instead of code. We strongly believe that most of the pain caused by accumulating pipeline debt is avoidable.

We built Great Expectations to make it very, very simple to:

  1. Set up your testing framework early
  2. Capture those early learnings while they're still fresh
  3. Systematically validate new data against them. It's the best tool we know of for managing the complexity that inevitably grows within data pipelines.

We hope it helps you as much as it's helped us. Main takeaways:

  • This talk will teach you how to use Great Expectations to get more done with data, faster
  • Save time during data cleaning and munging.
  • Accelerate ETL and data normalization.
  • Streamline analyst-to-engineer handoffs.
  • Monitor data quality in production data pipelines and data products.
  • Simplify debugging for data pipelines if (when) they break.