Skip to main content
Page 1
Industries category icon 1

PySpark in 2023: A Year in Review

With the releases of Apache Spark 3.4 and 3.5 in 2023, we focused heavily on improving PySpark performance, flexibility, and ease of use...
Engineering blog

Introducing Python User-Defined Table Functions (UDTFs)

Apache Spark™ 3.5 and Databricks Runtime 14.0 have brought an exciting feature to the table: Python user-defined table functions (UDTFs). In this blog...
Engineering blog

Arrow-optimized Python UDFs in Apache Spark™ 3.5

In Apache Spark™, Python User-Defined Functions (UDFs) are among the most popular features. They empower users to craft custom code tailored to their...
Engineering blog

Memory Profiling in PySpark

There are many factors in a PySpark program's performance. PySpark supports various profiling tools to expose tight loops of your program and allow...
Engineering blog

How to Profile PySpark

In Apache Spark™, declarative Python APIs are supported for big data workloads. They are powerful enough to handle most common use cases. Furthermore...
Platform blog

Deploying dbt on Databricks Just Got Even Simpler

At Databricks, nothing makes us happier than making our users more productive, which is why we are delighted to announce a native adapter...
Engineering blog

Python Autocomplete Improvements for Databricks Notebooks

At Databricks, we strive to provide a world-class development experience for data scientists and engineers, and new features are constantly getting added to...
Engineering blog

Interoperability between Koalas and Apache Spark

Koalas is an open source project which provides a drop-in replacement for pandas, enabling efficient scaling out to hundreds of worker nodes for...
Company blog

Introducing Koalas 1.0

Koalas was first introduced last year to provide data scientists using pandas with a way to scale their existing big data workloads by...
Engineering blog

10 Minutes from pandas to Koalas on Apache Spark

This is a guest community post from Haejoon Lee, a software engineer at Mobigen in South Korea and a Koalas contributor. pandas is...
Engineering blog

Introducing New Built-in and Higher-Order Functions for Complex Data Types in Apache Spark 2.4

Try this notebook in Databricks Apache Spark 2.4 introduces 29 new built-in functions for manipulating complex types (for example, array type), including higher-order...