We’re always told to ‘Go for the Gold!,’ but how do we get there? This talk will walk you through the process of moving your data to the finish fine to get that gold metal! A common data engineering pipeline architecture uses tables that correspond to different quality levels, progressively adding structure to the data: data ingestion (‘Bronze’ tables), transformation/feature engineering (‘Silver’ tables), and machine learning training or prediction (‘Gold’ tables). Combined, we refer to these tables as a ‘multi-hop’ architecture. It allows data engineers to build a pipeline that begins with raw data as a ‘single source of truth’ from which everything flows. In this session, we will show how to build a scalable data engineering data pipeline using Delta Lake, so you can be the champion in your organization.
Amanda Moran is a Bay Area-based Solutions Architect for Databricks. Her passion is helping customers, users, and the community be successful. Previously, she worked for HP, Teradata, DataStax, and Apache Trafodion startup Esgyn. Amanda's an Apache Committer and member of the PMC for Apache Trafodion. She's worked on customer POCs, executive demos, distributed database cloud deployments, Python coding, data science workshops, has spoken at many conferences, Linux/Hadoop administration, and scripting—a little bit of everything. She has a master's degree in computer science from Santa Clara university and a BS in biology from the University of Washington. In her spare time, she loves running, vegan baking, and finding reasons to go to Disneyland.