Implementing a Reliable Data Lake with Databricks Delta and the AWS Ecosystem

You might have an existing Data Lake or are considering leveraging AWS to centralize your data assets in the cloud. In this session, we will demonstrate how to leverage the AWS Glue data cataloging feature and crawl various S3 data sources to discover their schemas and populating your centralized metadata repository. Then, cleans your data with the reliability of Databricks Delta Lake to produce fast and accurate business outcomes.

Next, we will use the Amazon Athena serverless interactive query service to analyze the curated data residing on the highly available and durable Amazon S3 and integrated with the unified AWS Glue metadata repository.

Finally learn how to quickly enhance your application with rich interactive data visualization and analytics capabilities with minimal effort using Amazon QuickSight.


Try Databricks
See More Spark + AI Summit Europe 2019 Videos

« back
About Denis Dubeau


Denis Dubeau is a Partner Solution Architect providing guidance and enablement on modernizing data lake strategies using Databricks on AWS. Denis is a seasoned professional with significant industry experience in Data Engineering and Data Warehousing with previous stops at Greenplum, Hortonworks, IBM and AtScale.

About Jordan Martz


Jordan Martz works at Qlik, as a Principal Solutions Architect within the Partner Engineering team, focusing on Data Lakes and Data Warehouses Automation, specifically on SAP system, how to design real-time streaming applications, and supporting the effort around analytic roadmaps for consulting partners. Previously, he was a Sr Solutions Architect for Databricks, on their partner engineering team. Ironically before Databricks, he worked for Attunity, as the Director of Technology Solutions, which Qlik acquired.