Skip to main content
Scribd

CUSTOMER
STORY

Turning content into sign-ups, savings and scale

Scribd, Inc., transforms user engagement with AI-powered search

7%

Increase in sign-ups

90%

Reduction in GenAI costs

250M+

Documents processed

cs-scribd-still-image

Scribd, Inc., connects millions of people to knowledge through user-generated documents and presentations, and audiobooks and eBooks through publisher and distribution partnerships — powering a global content library of over 250 million pieces of content. But with growth came complexity: fragmented data workflows, limited content insights and scaling challenges. With Databricks, Scribd, Inc., unified data and AI into a single platform, enabling rapid GenAI innovation, smarter search, content trust and measurable business impact — unlocking what was once out of reach.

The hidden cost of untamed content growth

Scribd, Inc.’s mission is to spark human curiosity. Across their three brands — Scribd (user-powered library), SlideShare (collection of presentations) and Everand (audiobook and eBook digital subscription service) — the company serves millions of users globally, with a portfolio that has grown to more than 250 million active pieces of content. These long-form, multilingual and often media-rich content pieces span everything from personal essays and research papers to pitch decks, fiction novels and visual storytelling.

With such a massive and varied content corpus across UGC platforms, the company needed to ensure creators could easily publish high-quality content that consumers could discover, and that the platform could maintain trust at scale. But metadata like titles, descriptions and tags were often incomplete or missing altogether, limiting visibility and hurting SEO. Users struggled to surface relevant content through basic keyword search. And because so much of the library came from open uploads, the company faced increasing pressure to identify and filter out low-quality or even inappropriate content quickly and efficiently.

Scribd, Inc., saw a major opportunity to address this with generative AI, which would not just improve user-facing experiences but also support product development and internal operations. But their existing data infrastructure made it hard to operationalize AI across the business. Data pipelines were fragmented, the feedback loops between data scientists and production teams were slow and there was no central place where all teams — from engineering to product to data science — could collaborate effectively on AI development.

“There was too much context switching between tools,” R. Tyler Croy, Principal Architect at Scribd, Inc., added, “If you’re using one tool for data transformation, another for LLM experimentation and a third for serving models, you’re constantly jumping between environments. That creates friction, slows things down and limits what your team can accomplish.”

As large language models (LLMs) became more capable and cost-effective, Scribd, Inc., realized they needed an end-to-end platform that could unify data engineering, model development and AI operations — while handling the scale, diversity and performance their business required.

Rebuilding the stack for an AI-driven future

Scribd, Inc., needed a platform that could support the full lifecycle of data and AI — from ingestion and transformation to experimentation and real-time inference. The Databricks Data Intelligence Platform became that foundation.

At the core of Scribd, Inc.’s transformation is the Databricks Data Intelligence Platform, which brings together their massive, multimodal dataset in a single, governed environment. The company uses Databricks for batch ETL, real-time data processing and model development. This simplifies operations and accelerates iteration cycles. “All of our data science happens in Databricks Notebooks,” Tyler said. “It’s not just a tool; it’s the environment. We can run Spark jobs, build features, experiment with LLMs and test ideas all in one place.”

Mosaic AI Model Serving on Databricks has been a game changer for Scribd, Inc.’s generative AI initiatives. The team takes a highly pragmatic approach — choosing the right open-weight model (e.g., Llama, Mistral, Claude) based on the problem at hand. This flexibility supports a wide range of GenAI use cases — everything from auto-generating metadata for uploaded content to powering semantic search and chat-based discovery with features like Everand’s Ask AI.

Collaboration between Scribd, Inc., and Databricks extended beyond tooling. The teams worked closely together — especially during early deployments — on everything from tuning model performance in AWS to provisioning scarce GPU resources during periods of high demand. “The Databricks team didn’t just give us a platform — they were embedded with us during critical moments, helping tune models, troubleshoot infrastructure, and navigate GPU constraints in real time," Mike Lewis, Sr. Director of Product Management at Scribd, Inc., said. "That level of support helped us accelerate delivery and stay focused on building.”

Another key enabler was Databricks’ support for both real-time and batch model inference. In early prototyping, Scribd, Inc., leverages serverless endpoints to keep experimentation lightweight and cost-efficient. When they move into production — processing tens of millions of prompts — they seamlessly scale up their jobs by pointing their batch jobs at their largest tables. The service scales up automatically, allowing for massive throughput. This dual-mode flexibility lets the team maintain fast iteration during development while ensuring scalability for enterprise-grade deployments.

Turning AI investment to improve engagement and reduce costs

By integrating Databricks into the core of their AI and data operations, Scribd, Inc., has accelerated the development and rollout of intelligent features that improve both user experience and internal efficiency.

One of the most impactful GenAI-powered enhancements has been the auto-generation of content metadata — titles, summaries, descriptions and categories — that enrich every piece of user-uploaded content and make it more discoverable. These improvements have directly translated to a 7% increase in new user sign-ups and a 7% reduction in churn.

At the same time, Scribd, Inc., has significantly reduced operational costs by streamlining how they build and run GenAI models. Through Databricks’ model serving and batch inference capabilities, the company has cut the cost of running large language models by more than 90%. This cost efficiency has enabled them to expand the scope of GenAI across the business, from metadata enrichment to content moderation, without sacrificing quality.

Databricks has empowered Scribd, Inc., to dramatically speed up experimentation and deployment due to an integrated development environment and serverless infrastructure, allowing the company to go from prototype to production in weeks instead of months. “Our data scientists are able to test, tune and deploy in one continuous flow,” Tyler said. “There’s no need to switch between tools or environments — we can spin up a model, evaluate it and scale it in the same place.”

Scribd, Inc., is also seeing a cultural shift internally — toward more AI-native product thinking and more cross-functional collaboration. Databricks has become a hub where data scientists, engineers and product managers co-create, debug and iterate on generative AI solutions, among other emerging use cases.

Looking forward, Scribd, Inc., is building even more AI-native features — like intelligent topic extraction, in-document navigation and slide-level search on SlideShare and Scribd that lets users jump to exactly the content they care about. These features are only possible because of the scalable, unified infrastructure Scribd, Inc., has built with Databricks. “We’re moving toward a future where AI is embedded directly into how users search, explore and engage with content," Mike said. "It’s not a separate feature — it’s becoming core to the product experience. Databricks gives us the foundation to keep pushing forward.”