Streaming Data Pipelines: From Supernovas to LLMs
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Breakout |
TRACK | Data Engineering and Streaming |
INDUSTRY | Health and Life Sciences, Public Sector |
TECHNOLOGIES | AI/Machine Learning, Developer Experience, ETL |
SKILL LEVEL | Intermediate |
DURATION | 40 min |
DOWNLOAD SESSION SLIDES |
In this fun, hands-on, and in-depth HowTo, we use live streaming data for a comprehensive use case with the Databricks Intelligence Platform. The focus of this session is on data engineering. We will tackle the challenge of analyzing real-time data from collapsing supernovas that emit gamma-ray bursts provided by NASA with their GCN project. You'll learn to ingest data from message buses and decide between Delta Live Tables, DBSQL, or Databricks Workflows for stream processing. Understand how to code ETL pipelines in SQL, including Kafka ingestion. Once we have the cleaned data stream, I'll demonstrate how Databricks Data Rooms offer natural language analytics and compare it to a notebook streaming data into a Vector Database for open source LLMs with RAG. This session is ideal for data engineers, data architects who like code, genAI enthusiasts, and anyone fascinated by sparkling stars. Learn when and how to use which Databricks products. The demo is easy to replicate at home.
SESSION SPEAKERS
IMAGE COMING SOON
Frank Munz
/Principal TMM
Databricks