Willem Pienaar

Architect, Tecton

Willem is a tech lead at Tecton where he currently leads open source development for Feast, the open source feature store. Willem previously led the data science platform team at GOJEK, working on the GOJEK ML platform, which supports a wide variety of models and handles over 100 million orders every month. His main focus areas are building data and ML platforms, allowing organizations to scale machine learning and drive decision making. In a previous life, Willem founded and sold a networking startup and was a software engineer in industrial control systems.

Past sessions

Summit 2021 Rethinking Feature Stores

May 27, 2021 04:25 PM PT

Feature stores have emerged as a key component in the modern machine learning stack. They solve some of the toughest challenges in data for machine learning, namely feature computation, storage, validation, serving, and reuse.

However, the deployment of feature stores still requires a coordinated effort from multiple teams, comes with a large infrastructural footprint, and leads to integration costs and significant operational overheads. This large investment places feature stores completely out of reach for the average data team. What’s needed is a fundamental redesign of the feature store.

In this talk we will introduce a new light weight feature store framework that allows any data source to be operationalized by declaring them as dependencies to production ML applications, without coupling these applications to environment specific infrastructure. By publishing model-centric logical feature definitions, this framework will allow data scientists to build ML applications that depend on any data source, using their tools of choice, and deploy to their existing production infrastructure.

In this talk we will also demonstrate how this new paradigm empowers individual data scientists to develop and serve a production-grade ML application in less than one minute.


Summit 2020 Scaling Data and ML with Apache Spark and Feast

June 24, 2020 05:00 PM PT

Gojek, Indonesia's first billion-dollar startup, has seen an explosive growth in both users and data over the past three years. Today, it uses big data-powered machine learning to inform decision making in its ride-hailing, lifestyle, logistics, food delivery, and payment products, from selecting the right driver to dispatch to dynamically setting prices to serving food recommendations to forecasting real-world events. Hundreds of millions of orders per month, across 18 products, are all driven by machine learning. Features are at the heart of what makes these machine learning systems effective. However, many challenges still exist in the feature engineering life-cycle. Developing features from big data is often an engineering heavy task, with challenges in both the scaling of data processes and the serving of features in production systems.

Teams also face challenges in enabling discovery, reducing duplication, improving understanding, and providing standardization of features throughout organizations. In this talk, Willem Pienaar will explain the need for features at organizations like Gojek and will discuss the challenges faced in creating, managing, and serving them in production. He will describe how leveraging open source software like Spark and MLflow allowed their team to build Feast, an open source feature store that bridges data engineering and machine learning. He will explain how Feast and Spark allows them to overcome these challenges, the lessons they learned along the way, and the impact the feature store had at Gojek. Finally, he demonstrate how democratizing the process of creating, sharing, and managing features dramatically reduces time to market and leads to key insights.

Summit 2019 Scaling Ride-Hailing with Machine Learning on MLflow

April 24, 2019 05:00 PM PT

GOJEK, the Southeast Asian super-app, has seen an explosive growth in both users and data over the past three years. Today the technology startup uses big data powered machine learning to inform decision-making in its ride-hailing, lifestyle, logistics, food delivery, and payment products. From selecting the right driver to dispatch, to dynamically setting prices, to serving food recommendations, to forecasting real-world events. Hundreds of millions of orders per month, across 18 products, are all driven by machine learning.

Building production grade machine learning systems at GOJEK wasn't always easy. Data processing and machine learning pipelines were brittle, long running, and had low reproducibility. Models and experiments were difficult to track, which led to downstream problems in production during serving and model evaluation. In this talk we will cover these and other challenges that we faced while trying to scale end-to-end machine learning systems at GOJEK. We will then introduce MLflow and explore the key features that make it useful as part of an ML platform. Finally, we will show how introducing MLflow into the ML life cycle has helped to solve many of the problems we faced while scaling machine learning at GOJEK.