Accurate sales forecasts enhance customer satisfaction across all industries, including the travel and retail sectors. Pilot Company operates more than 900 locations, with business operations that depend on fast and reliable predictions to serve our guests. Legacy bottlenecks — such as underutilized compute and lengthy model cycles — could lead to outdated forecasts, which may negatively impact our guest experience.
Pilot Company needed an AI pipeline that empowered frequent, granular forecasting — without escalating infrastructure spend. Traditional batch or manual retraining methods could not keep up.
To address these challenges, Pilot Company leveraged:
Layering in Ray for task parallelism on top of Spark’s data parallelism enables Pilot Company to streamline end-to-end model building. Models now retrain as soon as new data becomes available, translating infrastructure efficiency into tangible business benefits.
Building a high-accuracy forecasting model for retail sales involves several steps:
In Spark, data is divided into partitions. Each executor processes these in parallel using higher-order and lambda functions (e.g., map, filter) across RDDs or DataFrames. This allows seamless scaling for ETL, batch processes and feature engineering, slashing time for data prep and transformation.
Ray enables thousands of model training or hyperparameter tuning jobs to run simultaneously, assigning optimal CPU or GPU fractions per task — maximizing cluster utilization. For example, Ray can fine-tune more than 1,400 time series models at once across a 52-core cluster, matching compute supply to job demand.
Integration with Databricks enables in-memory transfer via Apache Arrow, shifting from Spark’s data environment into Ray-driven ML tasks with zero I/O bottlenecks.
Task parallelism is crucial in model training and tuning, particularly when optimizing thousands of individual models — such as forecasting demand for every item at every store across an entire retail network. Ray excels at this layer, orchestrating workloads so that each model training or hyperparameter optimization task runs independently and simultaneously.
What sets Ray apart is its resource efficiency: not only can it manage thousands of concurrent tasks, but it dynamically allocates the right amount of compute resources — assigning specific numbers or even fractional shares of CPU and GPU cores to each task based on complexity. For example, Ray can assign as little as 0.25, 0.5, or 1 full CPU core (or combinations like 1 CPU and 0.5 GPU) to different jobs, maximizing overall cluster utilization.
In our benchmark, this fine-grained parallelism enabled us to train and tune more than 1,400 time series models concurrently across a 52-core cluster, resulting in a reduction of total processing time from nearly 3 hours (using only Spark) to under 30 minutes with Ray on Databricks. This not only means engineers can do more, faster, but it also ensures that every ounce of available hardware is fully leveraged for business value.
Key benefit: Ray’s integration with Databricks enables in-memory data transfer with Apache Arrow, allowing ML workloads to switch from Spark’s data prep environment directly into Ray-powered experiments — without cumbersome file I/O.
Scenario
Baseline: Spark Only (Cluster 1)
Accelerated: Ray-on-Spark (Cluster 2)
Business Impact:
This parallelized approach not only accelerated forecasting for our retail operations, but it will also transform the way Pilot Company’s supply chain, merchandising and marketing teams use insights to support our purpose of showing people they matter at every turn. By reducing the time required for training and tuning forecasting models from hours to minutes, models can be retrained much more frequently. This increased frequency enables predictions to reflect real-time sales trends, seasonality and changing consumer preferences — capturing “ground reality” far better than slower, batch-based approaches.
As a result, inventory planning will be more accurate, allowing stores to stock precisely what they need. For category managers, the speed to actionable insight will enable tighter alignment between demand forecasts and procurement decisions.
Most importantly, the marketing and merchandising teams will be empowered to respond rapidly with data-driven promotional campaigns, launching offers at the most opportune moments, showing people they matter at every turn. This closed feedback loop — where models are continually improved based on the latest store-level data — positions the business to remain agile and guest-obsessed in a rapidly changing retail environment.
A simple code example:
The combination of data parallelism with Spark and task parallelism with Ray — mainly when run on a unified Databricks stack — allows AI/ML teams to break through legacy bottlenecks. Execution times plummet, compute utilization soars, and enterprise deployments become much more cost-effective.
Adding Ray to your Databricks Spark clusters can deliver a model reduction build time for large-scale ML tasks — enabling organizations to forecast, plan and compete with new speed and accuracy.
Redefine what’s possible with Databricks Data Intelligence Platform. Learn more today.
