reduction in computing costs
users across the organization
Every day, more than 20 million Reckitt products are bought globally. Reckitt seeks to always put consumers and people first, seek out new opportunities, strive for excellence in all that they do, and build shared success with all partners.
With billions of customers to serve across thousands of retailers, real-time insights play a critical role in the accurate understanding and predictability of market trends, consumer behaviors and inventory management across diverse industries.
“By leveraging data and analytics, we can address challenges related to changes happening in various markets as they take place,” said Sergiy Tkachuk, Head of Data Science at Reckitt. “It’s the only way to deliver best-in-class experiences at scale, helping our partners to retain more customers and positively impact revenue streams.”
Although Reckitt’s initial Hadoop data warehouse environment provided the company with a good understanding of tabular data formats and relationships, data silos and the on-demand scaling challenges impacted their ability to acquire unified views into the customer and market. It also influenced critical data needs, such as surfacing repeat-customer information or order details to assess behaviors and trends over time.
In the legacy Hadoop environment, business analysts were limited to accessing data from function-specific data systems and had to work in spreadsheets, which were intensive, time-consuming and unable to scale. The data science team was pulling data from multiple warehouses, using various tools to process information and get results on their machines. Analysts were leveraging different tools for querying, dragging down efficiency and introducing data locality issues.
“Things got even more complex when the data started rolling in more rapidly,” said Tkachuk. “Following the surge in e-commerce adoption, the need for more collaborative data and engineering environments emerged. It was challenging to track customer behavioral purchases or loyalty patterns, as the data residing in silos takes a long time even to perform simple descriptive analytics. Time to market for analytics products suffered from increasing complexity and more users onboarding very quickly.”
Reckitt’s business analysts and data scientists were in major need of a collaborative environment to work on joint projects and easily share large data sets, with better flexibility across tools and the ability to scale as data volumes rose and platform usage went up.
Many considerations went into selecting the foundation for their data vision, with a critical consideration revolving around the ability to provide analytics tools and knowledge to their business users, and enable them to build their own data-driven solutions. This shared approach to data was a core reason Reckitt turned to Databricks Lakehouse Platform, but it certainly wasn’t the only one. Reckitt’s data scientists, analysts and engineers needed to create a standardized data model to comprise all cross-functional data, in a normalized form that could be picked up by any function and scaled as volumes rose. Delta Lake provides Reckitt’s data team with a common data layer to build scalable and reliable data pipelines for both analytics and machine learning workloads. With all their data ready for analytics, their analytics team could now power high-performing BI and analytics, and would improve creation time-to-value.
Today, the data science team has implemented several ETL pipelines for 50+ structured and unstructured data sources, and groups throughout the business — many outside of the data lake core development team — use Databricks Lakehouse to derive deeper market insights, better understand customer patterns and build KPIs around consumer purchase behavior. Data scientists use the new platform to enhance feature engineering and integrate with different beta dimensions for model building, as well as operationalize those models using MLflow.
These new capabilities powered another major initiative that Reckitt was able to undertake: building an internal data platform to serve as the home of all consumer data. With Databricks Lakehouse as the underlying technology, teams across the business are now able to unlock the value of consumer data and draw insights without an assist from data science or engineering. Even nontechnical users can activate segments and do advanced modeling within the platform.
“The goal of this initiative was to enable teams worldwide to outperform their goals, and deliver best-in-class customer experiences from the ground up,” said Tkachuk. “With the Databricks Lakehouse Platform, our stakeholders and developers can work collaboratively, delivering global recommendations systems, brand sales performance tracking, and other key enhancements for health, hygiene and nutrition businesses — in 73 countries for more than 180 million customers.”
Thanks to the streamlined processes and flexible automation capabilities made possible by Databricks, Reckitt saves on administrative efforts and costs, allowing them to spend more time on value-adding activities.
“We’re democratizing data consumption across the entire organization,” said Tkachuk. “We’re enabling not only engineers and data scientists to make better use of data, but also business analysts and stakeholders with nontechnical backgrounds — that’s approximately 7,000 users in total.”
Since implementing Databricks Lakehouse, Reckitt has optimized its global data strategy and continues to save significant hours on computation, subsequently reducing total computing spend by about one-fourth.
“The biggest impact we achieved with this is that we now have a single source of truth for all data,” said Tkachuk. “Discoveries are being leveraged and used at the organizational level, giving consistent insights across teams. The unified platform has helped our stakeholders, data engineers, data scientists, business analysts and reporting experts get more actionable, accurate insights by connecting the different dots of data points, helping us make smarter business decisions.”
Reckitt continues to pursue innovation and evolution when it comes to its Lakehouse foundation. They plan to adopt new features and integrations on the Databricks platform in order to advance their architecture and add further scalability and flexibility for users. The roadmap includes extending their adoption of other powerful components of the Lakehouse Platform — such as Delta Sharing for an even more effortless data sharing experience across the organization, Unity Catalog to allow them to ensure better data and AI governance, and integrations to leverage the power of graph-based algorithms. Finally, Reckitt plans to leverage the Koalas implementation on PySpark for quicker Python user onboarding, increasing coverage and further ensuring data consistency and visibility.
“With data centralized and available on demand, we now have an unmatched ability to unlock value-adding BI and ML use cases,” said Tkachuk. “From data scientist to executive, Databricks Lakehouse has been game-changing for Reckitt and all of our partners.”
Learn more about Reckitt and its global brands.