How AT&T Data Science Team Solved an Insurmountable Big Data Challenge on Databricks with Two Different Approaches using Photon and RAPIDS Accelerator for Apache Spark
- Industry and Business Use Cases
- Moscone South | Upper Mezzanine | 159
- 35 min
Data driven personalization is an insurmountable challenge for AT&T’s data science team because of the size of datasets and complexity of data engineering. More often these data preparation tasks not only take several hours or days to complete but some of these tasks fail to complete affecting productivity. In this session, the AT&T Data Science team will talk about how RAPIDS Accelerator for Apache Spark and Photon runtime on Databricks can be leveraged to process these extremely large datasets resulting in improved content recommendation, classification, etc while reducing infrastructure costs. The team will discuss the design of experiments on different Azure Databricks runtimes with NVIDIA T4 GPU instances and then by Databricks’ Photon runtime. The team will compare speedups and costs to the regular Databricks runtime Apache Spark environment. The size of tested datasets vary from 2TB - 50TB which consists of data collected from for 1 day to 31 days. The talk will showcase the results from both RAPIDS accelerator for Apache Spark and Databricks Photon runtime.
AT&T Data Science team are working on accelerating below ETL use cases on Mobility Subscriber Browsing Dataset(MSP):
Use case 1: Predicting Sport Games Viewership
Use case 2: Correlations Between TV Genres and MSP categories
Use case 3: Minors vs Adults
Use case 4: Demographic Model Enhancement using MSP Variables
Use case 5: Advertisement BRD (Audience Builder)
We will show that both the RAPIDS accelerator for Apache Spark and Photon can speedup the whole job by at least 3.3X speedups and reduce by at least 1/2 total cost.