Skip to main content

At the time of writing this blogpost, I'm a mere one week away from the end of my summer internship on the Exploratory Data Analysis (EDA) team here at Databricks. I can't believe the summer has flown by this quickly—it feels like just yesterday that I was cloning my team's repo and pestering my onboarding buddies for help! Over the course of 12 weeks, I completed a series of three project phases with one underlying theme: improving the user experience for interacting with images in the Databricks notebook.

The Databricks Notebook

If you've ever interacted with data through code, you've probably used a notebook. Notebooks are a type of code editor for Python, SQL, Scala, and R, commonplace in data science and machine learning as a means to extract and use data. As a Data + AI company, Databricks provides customers with its own notebook deeply integrated with the platform.

What is the Databricks Notebook?

The Databricks notebook supports the regular features that other notebooks support, such as a code editor, menu items, and the Databricks Assistant. But what's special about the Databricks notebook is that it's extremely well-integrated with the rest of Databricks' products: Jobs, Delta Live Tables (DLTs), Generative AI (GenAI) pretraining and fine-tuning, and more. Customers use the Databricks notebook to access the entire suite of Databricks' offerings, so creating a seamless notebook experience (which is what the EDA team focuses on) is an important element for Databricks to unlock the power of data for its customers.

What problem did my intern project tackle?

The Databricks notebook is a mature product, but as with any product, there are always things to improve! Turning data into insight is as much about telling a story as it is about crunching the numbers, and images are a key part of that. Further, as GenAI expands into different domains, such as vision and image generation, training and fine-tuning models with images and videos is becoming increasingly common. Databricks recently released Shutterstock ImageAI, which generates high-quality custom images based on specific business needs.

Researchers and engineers across the world use the Databricks notebook every day for countless applications that involve multimedia files. However, until recently, working with multimedia files in the notebook was cumbersome. For instance, customers had to figure out roundabout ways to embed images in notebook markdown cells, and they couldn't even open images from the file browser.

Attempting to open any image from the file browser throws an error and interrupts the notebook user experience
Attempting to open any image from the file browser throws an error and interrupts the notebook user experience

My summer intern project focused on improving the user experience for interacting with images in the Databricks notebook. Below are the key features that I rolled out this summer.

Key Features

Embedding images in notebook markdown

We've added the ability to embed images in markdown cells in a more user-friendly, standard markdown format. Now, customers can embed images with both relative paths and absolute paths (/Workspace for workspace files, and /Volumes for volumes files). This gives customers more flexibility in introducing images into their notebooks, whether it be for data visualization, image training, or feline comic relief.

Embedding images in notebook markdown using relative and absolute paths
Embedding images in notebook markdown using relative and absolute paths

Drag & drop images into notebook cells

A natural action for customers is dragging and dropping images into the notebook. Previously, dragging and dropping an image into the Databricks notebook resulted in opening the image in a new browser tab, which interrupted the customer's flow and prevented customers from easily using images.

Now, dragging and dropping an image into a notebook markdown cell automatically uploads the image to the workspace file system and embeds it in the cell!

Dragging & dropping an image from the local machine into a markdown cell
Dragging & dropping an image from the local machine into a markdown cell

Due to Databricks' fast-paced nature and rapid product iteration, I was able to fully roll out most of my project's key features to production by the end of my internship! Having this much customer impact as an intern was never something that crossed my mind before this summer, and I'm very grateful to have had the opportunity to have a clear influence on our product in the span of just three months.

My Internship Experience

My intern project wasn't the only thing that I was able to do this summer! I had the opportunity to attend the 2024 Data + AI Summit (DAIS), work on a cool hackathon project with the Databricks Assistant, visit the new and growing Databricks office in Seattle, and go on many, many delicious meal excursions with my intern class.

This summer, I had the opportunity to meet, learn from, and work with many of the industry leaders in the Data + AI space. Moreover, interacting with a large and energetic intern class made me more excited about new technologies than ever before. I'm not hesitant to say that I've truly made lifelong friends during my time here.

I'd like to give a special thanks to my mentor Richard Fung, my manager Neha Sharma, our Workspace org director Ted Tomlinson, and the rest of the EDA team for their mentorship. Every one of my team members was so impressively intelligent yet modest—sitting through every one of my minor feature demos and giving extensive feedback to help make my project features better. They've taught me invaluable skills that I'll carry for the rest of my career.

If you're passionate about building interesting and impactful products, then I recommend that you apply to work at Databricks! You can check out current job opportunities on the Databricks Careers page.

Try Databricks for free

Related posts

Making Spark Accessible: My Databricks Summer Internship

September 26, 2023 by Amanda Liu in
My summer internship on the PySpark team was a whirlwind of exciting events. The PySpark team develops the Python APIs of the open...

The Journey from Intern to New Grad: Mentorship, Autonomy, and Growth

October 18, 2022 by Katie TenBoer in
As we gear up to hire our 2023 intern and new grad class, we checked in with some former interns who have now...

Summer 2021 Databricks Internship - Their Work and Their Impact!

November 8, 2021 by Summer 2021 Interns in
With COVID precautions still in place, the 2021 Databricks Software Engineering Summer internship was conducted virtually with members of the intern class joining...
See all Platform Blog posts