CUSTOMER
STORY

Threading the needle for fashion resale success

Days

To identify, train and stage ML models instead of weeks

71%

Decrease in onboarding time for new analysts and data scientists (2 weeks to 4 days)

2 weeks

For new users to build MVPs, as opposed to 2 months

ThredUp, the world’s largest online consignment store, has revolutionized sustainable fashion by processing over 220 million unique items to date. Managing such a vast, data-driven marketplace presented significant challenges, including data silos, scalability bottlenecks and operational inefficiencies that slowed innovation. Onboarding new data analysts and scientists took up to two weeks, while new hires required months to deliver actionable outputs due to fragmented data and inefficient workflows. Before adopting Databricks, tasks like training machine learning (ML) models took days, limiting the company’s ability to deliver real-time personalization and actionable insights. With Databricks’ unified data platform, ThredUp transformed their operations, consolidating fragmented data, accelerating insights and enabling the seamless integration of advanced artificial intelligence (AI) and machine learning. New engineers now produce minimum viable products (MVPs) in as little as two weeks, compared with two months previously. Today, ThredUp can achieve in a day what once took weeks, empowering their teams to innovate faster and scale with confidence.

Scalability bottlenecks derail operational efficiency in fashion resale

ThredUp, founded in 2009, has revolutionized the fashion resale market by providing a platform for buying and selling secondhand clothing. As the world’s largest online consignment store, ThredUp aims to promote sustainable fashion by offering a data-driven marketplace that processes millions of unique items annually. “We’ve processed over 220 million items to date,” Dan DeMeyere, Chief Product and Technology Officer at ThredUp, said, highlighting the company’s impressive scale.

ThredUp’s business model requires managing an unprecedented volume of single SKU items, each treated as unique, which presents challenges in inventory management, pricing and personalization. Leveraging a highly data-driven operational model, ThredUp employs advanced ML models to address a variety of critical use cases that underpin their operations.

The company generates insights to optimize the flow of millions of unique items from sellers to buyers, ensuring seamless inventory management. Personalization is another key focus, as ThredUp builds models that predict user purchase probabilities and curate individualized shopping experiences to enhance customer engagement. “Every aspect of our business is intertwined with data,” Dan noted. “We rely on insights to make smart decisions about inventory, sellers and buyers. It’s ingrained in how our marketplace operates.”

In addition to these use cases, ThredUp uses data to drive pricing and promotion strategies, ensuring competitive and demand-tailored pricing models. Marketing performance is another critical area of focus, with insights enabling more effective campaign targeting and measurement for incentives like loyalty programs. Revenue forecasting relies on detailed analytics, powered by data pipelines and notebooks, to deliver accurate predictions and optimize business planning. Operations analytics also plays a vital role, feeding into the company’s business intelligence (BI) tool, Looker, to provide executives with actionable insights.

ThredUp’s reliance on vast amounts of data exposed critical challenges in accessing, processing and leveraging this information efficiently. According to Dan, the company’s pre-Databricks infrastructure was siloed, slowing innovation and decision-making. “You might have user-level data in one place, inventory data in another and event-driven data scattered elsewhere,” he explained. “Stitching all of this together was very, very hard.”

Aniket Mane, VP of Enterprise Data and Engineering at ThredUp, emphasized the impact of these inefficiencies: “The time to onboard a new analyst or data scientist used to take up to two weeks, and even then, they wouldn’t start delivering outputs until after two months. Our fragmented data platform made it hard to generate actionable insights quickly.”

Traditional databases stored transactional and event-driven data in separate silos, making it difficult to generate unified insights for applications such as personalization. Scalability constraints emerged as the existing architecture struggled to handle computationally intensive workloads, causing delays in ML model training and insight generation. Operational complexity compounded the issue, with multiple tools like Looker and Amplitude creating inefficiencies and adding costs. According to Dan, resource limitations also played a significant role. “Before we had a serverless architecture, we had AWS RDS. When we wanted to do computationally expensive analysis or train models, it could take hours, if not days, to get the answers we needed.”

Chintan Patel, Manager of Data Engineering at ThredUp, added, “Our Redshift system was not only limited by its performance bottlenecks but also required frequent fine-tuning and system restarts due to deadlocks. These constant interruptions created a significant maintenance burden for our team.” He further highlighted data silos as a key challenge, stating, “Data silos didn’t just slow our workflows — they stifled innovation. Our ability to experiment with and deploy machine learning models was often delayed because we couldn’t access all the necessary data in one place.”

Beyond the technical constraints, ThredUp also faced limitations in scaling their business impact. “Leadership was asking deeper and more complex questions about our data, but we didn’t have the tools to provide timely answers,” Aniket shared. “It would take days, if not weeks, to deliver insights that now take less than a day.” These challenges underlined the urgent need for a unified, scalable data platform to enable faster innovation and support ThredUp’s expanding use cases.

Unifying data with the Databricks Data Intelligence Platform

ThredUp turned to Databricks in 2017 to address their pressing data challenges and unify their fragmented infrastructure. Initially leveraging Databricks Notebooks, the company found immediate value in the platform’s collaborative environment, which accelerated data analysis and ML model development. “We no longer have to ask, ‘Where was the data? How do we bring it together?’” Dan explained.

Databricks’ lakehouse architecture transformed ThredUp’s data management by consolidating structured and unstructured data into a single, easily accessible platform. This unified approach eliminated the silos that had previously slowed innovation. The integration of Delta Lake between 2018 and 2019 provided robust support for ACID transactions and schema enforcement, ensuring consistent, high-quality data for downstream use. “Delta tables allowed us to create a centralized repository of truth, where data integrity is maintained across all use cases. This reliability has been crucial for operational efficiency,” Chintan said.

ThredUp also enhanced their reporting capabilities to provide nontechnical leaders with a comprehensive view of business performance. “We combine Databricks visualizations with Looker to create comprehensive dashboards for leadership,” Aniket explained. “This ensures that key decision-makers have a clear view of performance metrics, enabling informed and timely decisions across the organization.”

To further enhance governance and usability, ThredUp implemented Unity Catalog in 2023, a game changer for navigating their vast data ecosystem. “Unity Catalog democratized access to our data,” Dan said. “It used to require a lot of domain knowledge or manual lookups in a wiki to find the source of truth. Now, anyone can start on Databricks on day one, access the data they need and know it’s accurate.” Unity Catalog also strengthened security, providing centralized permission management and auditing capabilities that simplified compliance and governance.

In addition to unifying ThredUp’s data infrastructure, Databricks empowered the company to scale their data operations seamlessly. With the platform’s serverless architecture, ThredUp can dynamically allocate resources for computationally intensive tasks, ensuring peak performance without impacting production systems. “In the past, training a machine learning model could take days due to resource constraints. With Databricks, we can spin up resources on the fly and scale to infinity,” Dan noted. “It’s fully decoupled from our production databases, so heavy queries no longer slow down our website or mobile app performance.”

Databricks’ flexibility extends beyond engineering teams, enabling a broader audience of technologists to leverage advanced AI and ML workflows. Chintan highlighted the importance of this accessibility, saying, “Features like collaborative Databricks Notebooks and Git integration allow us to quickly iterate on models, share insights and make data-driven decisions. The AI Playground has also enabled our team to experiment with LLMs, generative AI and other state-of-the-art models without major technical hurdles.”

Another key advantage is Databricks’ role in operationalizing data for internal experimentation and innovation. “Our teams use Databricks to run internal hackathons, often transforming proofs of concept into production-ready features,” Aniket explained. “This iterative process allows us to stay ahead in a fast-paced market.”

Through their thoughtful, ongoing adoption of Databricks’ capabilities, ThredUp has built a modern, scalable data platform. This foundation enables the company to innovate faster, unlock new use cases and maintain their leadership in the competitive fashion resale market.

Training complex models in days

By unifying their data infrastructure and enhancing accessibility, ThredUp has accelerated their ability to deliver personalized experiences and optimize operations at scale. “The velocity at which we can now leverage data has transformed our business. Tasks that used to take weeks, like training models and putting outputs on staging, can now be done in days. This has allowed us to move faster on behalf of our customers and the business,” Aniket explained. The result is a more agile organization capable of responding quickly to market demands and customer needs.

This agility has created ample cost savings. According to Aniket, “We’ve maintained a lean data engineering team while supporting an increasing number of users. The tools are so user-friendly that we’re saving approximately half a million dollars annually by avoiding additional hires.”

ThredUp has significantly reduced the time required to train machine learning models, enabling faster iteration cycles and decision-making. Onboarding new analysts and data scientists has been streamlined from two weeks to just four days (a 71% decrease), enabling faster productivity and accelerating project timelines. Serverless architecture has provided the scalability necessary to handle ThredUp’s massive data volumes without impacting production systems, eliminating the bottlenecks that previously delayed insights.

Data democratization is now at the heart of ThredUp’s strategy to improve business impact. By making advanced analytics and self-service tools accessible to both customer-facing teams (like product and marketing) and back-office stakeholders (like finance, pricing and operations), ThredUp empowers all teams to leverage data for their unique needs. This democratized approach fosters collaboration and ensures that every team can develop new use cases and drive meaningful outcomes with minimal reliance on engineering resources.

ThredUp’s teams can now experiment with cutting-edge AI and ML technologies, from predictive analytics to GenAI. “Databricks’ AI Playground has enabled us to seamlessly integrate LLMs and advanced models into our workflows,” Dan added. “We’ve already seen improvements in personalization algorithms and expect these models to play a major role in future use cases.” One notable example includes an ML engineer who, within their first month, deployed a model for automated garment measurements — a process previously expected to take up to two quarters. In general, MVPs are being produced in as little as two weeks. It previously would take two months.

As ThredUp continues to innovate, the company is exploring new ways to leverage the Databricks Platform. The team is planning to explore the exciting frontier of conversational analytics by experimenting with Databricks’ AI/BI Genie to enable natural language interactions with data. “Genie will allow us to democratize data even further,” Aniket said. “Our goal is to make advanced analytics accessible to all teams, enabling faster, more informed decisions across the organization.”

ThredUp is also expanding their use of AI-driven insights, with plans to integrate tools like Model Serving into their workflows. These advancements will allow the company to build and deploy models more efficiently, while Unity Catalog ensures comprehensive governance across data and ML models.

By embracing Databricks’ evolving capabilities, ThredUp is well positioned to maintain their leadership in the fashion resale market. “I’m in Databricks regularly, as are multiple C-level executives at ThredUp, because it is that powerful. It’s so easy to collaborate with others,” Dan concluded. “If someone on the team generates a very interesting dashboard or analysis, you could very easily drill down further or have it turn into a living view of a certain part of the business.”

Share this post

Details

Industry: Marketing, Retail and Consumer Goods
Use Case: Data Science, Data Warehousing, Data Engineering
Cloud: AWS
Product: Delta Lake, Lakeflow Jobs, Lakeflow Spark Declarative Pipelines, Unity Catalog

Ready to get started?

Try Databricks for free Learn more about our product Talk to an expert