Abe Gong

CEO & Co-Founder, Superconductive

Abe Gong is a core contributor to the Great Expectations open source library, and CEO and Co-founder at Superconductive. Prior to Superconductive, Abe was Chief Data Officer at Aspire Health, the founding member of the Jawbone data science team, and lead data scientist at Massive Health. Abe has been leading teams using data and technology to solve problems in health care, consumer wellness, and public policy for over a decade. Abe earned his PhD at the University of Michigan in Public Policy, Political Science, and Complex Systems. He speaks and writes regularly on data, healthcare, and data ethics.

Past sessions

Untested, undocumented assumptions about data in data pipelines create risk, waste time and erode trust in data products. Automated testing has been one of the biggest productivity boosters in modern software development and essential for managing complex codebases. Data science and engineering have been largely missing out on automated testing. This talk introduces Great Expectations, an open-source python framework for bringing data pipelines and products under test. Great Expectations is a python framework for bringing data pipelines and products under test. Like assertions in traditional python unit tests, Expectations provide a flexible, declarative language for describing expected behavior. Unlike traditional unit tests, Great Expectations applies Expectations to data instead of code. We strongly believe that most of the pain caused by accumulating pipeline debt is avoidable.

We built Great Expectations to make it very, very simple to:

  1. Set up your testing framework early
  2. Capture those early learnings while they're still fresh
  3. Systematically validate new data against them. It's the best tool we know of for managing the complexity that inevitably grows within data pipelines.

We hope it helps you as much as it's helped us. Main takeaways:

  • This talk will teach you how to use Great Expectations to get more done with data, faster
  • Save time during data cleaning and munging.
  • Accelerate ETL and data normalization.
  • Streamline analyst-to-engineer handoffs.
  • Monitor data quality in production data pipelines and data products.
  • Simplify debugging for data pipelines if (when) they break.
Abe Gong