Empowering retail businesses to make smarter decisions
RDSolutions uses Databricks to deliver insights for informed decision-making
faster data processing (from 4 days to 3 hours)
of data records stored and processed
RDSolutions provides retail pricing intelligence, collecting billions of data points daily. Their legacy data pipeline faced scalability, maintenance and slow data processing issues, causing delays in time to insights. The Databricks Data Intelligence Platform resolved these challenges with a unified lakehouse and automated scalable data pipelines, improving data processing speed and reducing latency. The use of Databricks Assistant helped to automate time-consuming development tasks, resulting in faster data ingestion, improved accuracy, increased efficiency and the ability for RDSolutions to deliver timely, actionable insights to their customers.
Struggling to scale data within complex legacy pipelines
RDSolutions conducts 65,000+ online audits and thousands of weekly in-store audits to normalize and collect more than 500 billion data points weekly. To compile, process and distribute this decision-making data to big-box retail clients, RDSolutions relied on an Azure data warehouse, but the environment failed to scale cost-efficiently or effectively. For big queries to run, data teams had to stop all other work, stalling productivity and interrupting outcomes. Further adding to engineering distractions were daily requests from IT teams to scale warehouses, run queries and troubleshoot issues.
Andy Featherstone, Manager of Data Engineering at RDSolutions, said, “My CEO’s goal is to scrape the internet. He wants to collect as much information as possible, but our pipelines were too long and complicated in the old environment to take in more data confidently. We couldn’t say ‘yes’ to things, and leadership had to wait for us to ensure goals were attainable.”
The limitations of the existing system became apparent when the data team attempted a new use case involving a complex aggregation calculation spanning multiple months. The job failed to complete, highlighting the system’s inability to handle expanding data needs and signaling that a new solution was necessary. RDSolutions needed to modernize their data platform to meet leadership’s ambitions for scale and ensure timely, actionable insights for internal business users and clients. The ideal solution would be open source, easy to adopt and language-agnostic, with robust data governance controls and advanced analytics tools to support both current and future needs.
Modernizing data management with Databricks
To continuously expand the amount of data that can be surfaced, RDSolutions started looking at different data platforms with more technology for more use case fulfillment without limits. After a quick proof of concept provided evidence of performance, RDSolutions moved full steam ahead with the Databricks Data Intelligence Platform. Initially, they only sought ETL pipelines within Databricks but quickly realized the benefits of advanced, open source tooling and integration with Delta Lake.
The shift from proof of concept to complete migration to Delta Lake was rapid, taking only six months. Now the backbone of RDSolutions’s data strategy, Delta Lake enables seamless integration of structured and unstructured data — from grocery items to automotive parts. The platform’s scalability allows RDSolutions to continuously expand their data collection efforts without concerns about performance degradation, even as data volumes grow exponentially. Unity Catalog primarily functions as a data governance tool for managing access and controlling data sharing across multiple workspaces. Now, Featherstone and his team can rely on a single permissions model that simplifies access management across any of their data platforms. In an effort to promote self-service among their data analysts, they use Databricks SQL to empower their business data users with an intuitive SQL editor, allowing them to easily query data and extract valuable insights.
Moreover, the introduction of Databricks Assistant has streamlined the development process and improved team efficiency, regardless of skill level. Databricks Assistant helps automate tasks such as error troubleshooting and table documentation, allowing teams to focus on higher-value work.
Featherstone said, “One of the best things about Databricks Assistant is how it can automatically document your tables. A pop-up offers assistance with an error, and nine times out of 10, you click ‘yes,’ and the assistant makes everything perfect with the click of that button. So, that alone has made things significantly easier and more productive.” This integration of generative AI (GenAI) capabilities has empowered RDSolutions to be proactive, swiftly adapt and innovate without technical limitations.
Rapid time to insights and limitless scalability for the future
Since migrating to the Databricks Data Intelligence Platform, RDSolutions has been able to meet all data scraping goals put forth by leadership. Featherstone explained, “It’s nice to be in meetings with my CEO and other data teams where I can always answer ‘yes.’ I never have to think about it.” Thanks to Databricks’ scalable platform and the convenient centralized ingestion, storage and processing afforded by Delta Lake, RDSolutions is moving faster and satisfying more use cases than was possible in their previous environment.
An excellent example of the speed at which RDSolutions now moves is their data validation job. The job took four full days to land the data, validate it, clean bad records and add new ones within the legacy data warehouse. Immediately after switching to Databricks and building the job in a workflow, it was reduced to seven hours, and since optimizing the pipeline, the job now runs in just three hours. That’s a 97% process and performance improvement directly impacting the speed at which RDSolutions delivers accurate and reliable customer data.
Equipped with much faster processing speeds, RDSolutions looks forward to expanding into new GenAI use cases that can help teams make smarter decisions. Featherstone is confident about the future of RDSolutions’ data analytics and AI capabilities, thanks to Databricks. “Sometimes it sounds too good to be true, but Databricks is as fast as they say it is. It’s as secure. You’ll be able to handle it just like any other database out there, but you’ll get all these benefits. We just accumulate data, and I don’t have to touch a thing in Delta Lake. It’s been a game changer and will continue to be pivotal in our growth.”