Smit Shah

Senior Software Engineer, Big Data, Zillow

Smit is a data and software engineering enthusiast. Currently working as a Senior Software Engineer, Big Data at Zillow where he is building centralized data products and democratizing data quality. He was involved with building the self-serve centralized Anomaly Detection platform and open-sourcing the model library. He holds a Master of Science in Information Systems from Northeastern University in Boston.

Past sessions

Summit 2021 Democratizing Data Quality Through a Centralized Platform

May 27, 2021 03:15 PM PT

Bad data leads to bad decisions and broken customer experiences. Organizations depend on complete and accurate data to power their business, maintain efficiency, and uphold customer trust. With thousands of datasets and pipelines running, how do we ensure that all data meets quality standards, and that expectations are clear between producers and consumers? Investing in shared, flexible components and practices for monitoring data health is crucial for a complex data organization to rapidly and effectively scale.

At Zillow, we built a centralized platform to meet our data quality needs across stakeholders. The platform is accessible to engineers, scientists, and analysts, and seamlessly integrates with existing data pipelines and data discovery tools. In this presentation, we will provide an overview of our platform’s capabilities, including:

  • Giving producers and consumers the ability to define and view data quality expectations using a self-service onboarding portal 
  • Performing data quality validations using libraries built to work with spark
  • Dynamically generating pipelines that can be abstracted away from users
  • Flagging data that doesn’t meet quality standards at the earliest stage and giving producers the opportunity to resolve issues before use by downstream consumers
  • Exposing data quality metrics alongside each dataset to provide producers and consumers with a comprehensive picture of health over time
In this session watch:
Yuliana Havryshchuk, Developer, Zillow
Smit Shah, Senior Software Engineer, Big Data, Zillow


Summit 2021 Scaling AutoML-Driven Anomaly Detection With Luminaire

May 27, 2021 05:00 PM PT

Organizations rely heavily on time series metrics to measure and model key aspects of operational and business performance. The ability to reliably detect issues with these metrics is imperative to identifying early indicators of major problems before they become pervasive. This is a difficult machine learning and systems problem because temporal patterns are complex, ever changing, and often very noisy, traditionally requiring significant manual configuration and model maintenance.

At Zillow, we have built an orchestration framework around Luminaire, our open-source python library for hands-off time-series Anomaly Detection. Luminaire provides a suite of models and built-in AutoML capabilities which we process with Spark for distributed training and scoring of thousands of metrics. In this talk, we will cover the architecture of this framework and performance of the Luminaire package across detection and prediction accuracy as well as runtime efficiency.

In this session watch:
Sayan Chakraborty, Data Scientist, Zillow, Inc.
Smit Shah, Senior Software Engineer, Big Data, Zillow