Low-Code Exploratory Data Analysis with Bamboolib in Databricks
August 14, 2022 in Platform Blog
We are very excited to announce that the public preview of bamboolib in the Databricks Notebook begins today! It is available with the Databricks Runtime (DBR) 11.0+ on AWS and Azure and with DBR 11.1+ on GCP, and getting started is as easy as using the following code snippet in the Notebook:
%pip install bamboolib # new cell import bamboolib as bam # optional new cell bam
What is bamboolib?
Bamboolib is a low-code tool that provides a graphical user interface for the capabilities of pandas, the standard Python data science library everyone knows and loves. Using bamboolib, you have all the power of code-first data science at your fingertips without the need to write the code yourself. This means you can
- access and load your data from database tables or CSV files;
- wrangle and transform your data from its raw form into clean, organized data that is ready for investigation; and
- explore, visualize, and analyze your data to uncover key insights that deliver outsized impact to your business.
Bamboolib accomplishes this with a glass-box approach to low-code analysis: it generates the underlying pandas code for each of these analytics operations, ensuring that you can reproduce the results of any analysis you perform with the UI. And, when the technique you want to use is not natively available in bamboolib, you can easily expand its capabilities with bamboolib's plugin framework.
NOTE: As we mentioned above, bamboolib is a low-code tool which uses the pandas library under the hood. This means you are limited to the size of data that can be stored in the memory of the compute resource you use (specifically, the cluster driver in Databricks). If you'd like to have access to larger data or support for Spark in bamboolib, we'd love to hear your feedback!
Why do we think bamboolib is awesome?
Bamboolib provides Databricks users of all backgrounds a gateway to all the deep functionality and flexibility available in code-first data science. It enables practitioners to
- Increase your productivity: Prepare, analyze, and visualize pandas DataFrames without writing the usual boilerplate code so you can focus on the work at hand.
- Become instantly fluent in pandas: If you're used to other tools like Excel, MATLAB, or SAS, you may know what you want to do but not know how to do it with pandas or Python. Bamboolib helps you translate a natural language description of the operations you want to do into idiomatic Python code.
- Have confidence in the outputs of your analysis and your code: You can easily see, evaluate, and export the code bamboolib produces as part of your work, ensuring that any work you do is reproducible both by yourself and any colleagues with whom you share your analysis.
Bamboolib's benefits also extend to data analytics leaders and organizations looking to expand their practitioners' skill sets and uplevel their impact:
- Simplifies onboarding and enables self-service analytics: Enable your citizen data scientists, domain experts, and other employees to achieve impactful and reproducible results in the Databricks Notebook with minimal technical overhead.
- Provides opportunity for learning and skill development: Bamboolib is a simple entry point to the world of Python data science. Its glass box approach ensures users have access to the code powering their analysis, and with this code they can learn the pandas library and the core methods of modern data science.
We hope you will think it is as awesome as we do!
Welcoming more users into the Lakehouse
At Databricks, we believe the Lakehouse is the ideal home for data analytics and the workloads powering them, and we want to welcome as many people to the Lakehouse as we can. Bamboolib gives us the opportunity to open the doors to whole new audiences, and we are very excited by this opportunity.
Thanks to 8080 Labs joining the Databricks family and our deep investment in supporting the Jupyter ecosystem, this has all been made possible. Bamboolib is built using the ipywidgets framework and the IPython kernel, and it is the first of the Jupyter ecosystem's powerful custom tools that we are introducing to Databricks. We look forward to bringing more of these capabilities and the users that love them into Databricks in the future.