HomepageData + AI Summit 2023 Logo
SAN FRANCISCO, JUNE 26-29
VIRTUAL, JUNE 28-29
  • Sessions
Watch on demand

Building a Minimalistic Open Lakehouse Using Open Source Projects Apache Spark™: Project Nessie and Iceberg

Wednesday, June 28 @3:30 PM
Attending in person? Add to your schedule ↗

Overview

A Lakehouse architecture is a combination of various components such as storage, file format, table format, and catalog. What truly makes a lakehouse 'open' is data being stored in open source table and file formats like Iceberg, Delta and Parquet respectively, and the technology being open sourced for easy and quick adoption by the community. Like any new technology, implementation of a lakehouse may seem daunting at first. However, when we break down the architecture to its open components, this becomes easy to adopt and scale.



 



Though this session, the idea is to help data engineers getting their leg into the world of data lakehouses, easily learn and implement it. We will go through a Notebook-style presentation to show beginners how to build a minimalistic functional lakehouse using Apache Spark, Project Nessie and Iceberg.



 



In this session, we will cover:




  • Configuring the three different components

  • Creating tables from raw data files

  • Ingesting new data from various sources into the tables, querying it and making updates

  • Time travel, compaction, etc. capabilities


Type

  • Breakout

Experience

  • In Person

Track

  • Data Lakehouse Architecture

Industry

  • Enterprise Technology

Difficulty

  • Intermediate

Duration

  • 40 min
Download session slides

Session Speakers

Headshot of Dipankar Mazumdar

Dipankar Mazumdar

Developer Advocate

Dremio

Don't miss this year's event!

Register now