The 2023 State of Data + AI: How Businesses Are Preparing for the New Age of AI
May 23, 2023 in Company Blog
The historic surge of interest in large language models (LLMs) since ChatGPT launched to the public late last year has made the topic inescapable. Not only is the technology improving at an unparalleled cadence, but companies are also building their own models like never before. Now, predictive models are underpinning mission-critical tasks, giving organizations a window into the future instead of just a review of the past, and helping them operate quicker and leaner.
On the cusp of this new computing revolution, we were eager to learn exactly where enterprises are at in this transformation, as well as the platforms and tools they’re using to take advantage of it. By analyzing anonymized usage data from more than 9,000 global Databricks customers, we’ve compiled the 2023 State of Data + AI, a comprehensive look at organizations’ data and AI initiatives.
Here’s a glimpse at what we discovered:
- The hype around LLMs is real: From the end of November 2022 to the beginning of May 2023, the usage of SaaS LLMs, which are used to access models like OpenAI, grew exponentially with Lakehouse customers at 1310%. Transformer-related libraries like HuggingFace (an NLP toolkit and model hub), which are used to train homegrown LLMs and were in demand even before the launch of ChatGPT, grew 82% within the same time frame.
- Data transformation and integration is more vital than ever: The fastest growing tools on Databricks are dbt (206% YoY) and FiveTran (181%). But of the 10 most popular data and AI products, six are data integration tools, including Informatica and Qlik, making it the fastest growing market on the Databricks Lakehouse.
- Companies eye open source: When looking at the most popular data and AI products, Microsoft Power BI and Plotly reign above the rest. But organizations are showing a strong pull to open technologies; 8 of the 10 most popular data and AI products are based on open source software, including dbt, Hugging Face and GeoPandas.
- Enterprises are doing more AI projects than ever before – and getting better at it: The number of models that are candidates for production (used in operations) grew 411% year-over-year, while the number of experimental projects grew 54%. Our data also shows that, on average, one in three experimental models are a candidate for the real-world, compared to one in five last year, suggesting organizations are getting better at building and scaling these projects.
- AI is growing, but don’t forget traditional data analytics: Last year, Power BI was the most popular program running on top of the Lakehouse. The Lakehouse is increasingly being used for data warehousing, including serverless data warehousing with Databricks SQL, which grew 144% YoY.
While it’s still early days, these emerging trends are bound to define the future of AI. And business leaders need to pay attention. It's never been more clear: the companies that harness the power of DS/ML will lead the next generation of data.
Download the full report here to learn more!