Simplify Data Ingestion With the Python Data Source API

What you’ll learn

The introduction of the Python Data Source API for Apache Spark™ marks a significant advancement in making big data processing more accessible to Python developers.

Traditionally, integrating custom data sources into Spark required understanding Scala, posing a challenge for the vast Python community. Our new API simplifies this process, allowing developers to implement custom data sources directly in Python without the complexities of existing APIs.

This demo will outline the API's key features, including simplified operations for reading and writing data, and its benefits to Python developers.

 

Note - at Data + AI Summit in June 2025, Databricks released Lakeflow. Lakeflow unifies Data Engineering with Lakeflow Connect, Lakeflow Declarative Pipelines (previously known as DLT), and Lakeflow Jobs (previously known as Workflows).

Recommended

<p>Querying State Data in Spark Structured Streaming With the State Reader API</p>

On-Demand Video

Querying State Data in Spark Structured Streaming With the State Reader API

<p>The Serverless, Real-Time Lakehouse in Action</p>

On-Demand Video

The Serverless, Real-Time Lakehouse in Action

<p>Spark Streaming - Advanced</p>

Tutorial

Spark Streaming - Advanced

Ready to get started?