Skip to main content
CUSTOMER STORY

Powering amazing vacations with data and ML

60%

Reduction in processing times due to faster ETL pipelines

50%

Reduction in IT operational costs

Faster

Time-to-insight led to  a significant growth in business

getyourguide hh header image color
CLOUD: AWS

"Databricks has given us the ability to move seamlessly between experimentation and the productionization of workloads and data pipelines in a unified platform.”

— David Mariassy, Data Engineer, GetYourGuide

GetYourGuide operates a leading online platform and marketplace for people to discover and book sightseeing tours, tickets for attractions and other experiences around the world. With Databricks, they are now able to ingest massive volumes of data for downstream machine learning that powers their personalized online marketplace.

Slow data pipelines struggle with big data

GetYourGuide’s primary mission is to leverage data and AI to power a personalized shopping experience for their customers. However, their struggles with data engineering impacted their ability to efficiently ingest massive volumes of data per day for downstream machine learning.

  • Processing over 600GB of data per day and over 50,000 activities for customers to choose from.
  • Slow ETL pipelines blocked data science progress. Their primary ETL pipeline would run for more than five hours which impacted their ability to innovate the customer experience.
  • Single node processing caused scalability issues required to support an explosion of data.

Unlocking data science innovation with scalable data pipelines

Databricks provides GetYourGuide with a unified data analytics platform that has fostered a scalable and collaborative environment across data science and engineering, allowing data engineers and scientists to seamlessly combine exploratory workloads with production pipelines in the same environment without adding infrastructure complexity.

  • Fully managed platform on AWS enabled them to utilize native tools such as S3 as their file system.
  • Automated cluster management simplifies the infrastructure and operations at any scale.
  • Collaborative notebook environment with support for multiple languages (SQL, Scala, Python, R) enables a diverse team of users to work together in their preferred language.

Improved operational efficiencies boost business impact

With the move to Databricks, GetYourGuide was able to supercharge their data processing while being able to handle larger workloads. This newfound flexibility opened the doors to machine learning innovations to power their recommendation and relevance engine.

  • Faster ETL pipelines: They were able to accelerate data processing of their largest ETL pipelines — reducing processing times by 60%.
  • Improved cost efficiencies: Automated cluster management reduced their operational costs by more than 50%.
  • Better cross-team collaboration: Unified analytics environment for both data scientists and engineers, dramatically reduced the number of components needed to go into production with easy setup and management.
  • Faster time-to-insight: These operational efficiency gains and ability to deliver new features to market faster enabled them to experience a significant growth in business.