Guide to Media & Entertainment Sessions at Data + AI Summit 2022

Published: June 9, 2022

Download our guide to Communications, Media & Entertainment at Data + AI Summit to help plan your Summit experience.

The time for Data + AI Summit is here! Every year, data leaders, practitioners and visionaries from across the globe and industries come together to discuss the latest trends in big data. For data teams in Communications, Media & Entertainment, we have organized a stellar lineup of sessions with industry leaders including Adobe, Axciom, AT&T, Condé Nast, Discovery, LaLiga, WarnerMedia and many more. We are also featuring a series of interactive solution demos to help you get started innovating with AI.

Media & Entertainment Forum

There are few industries that have been disrupted more by the digital age than media & entertainment. With the consumer expectation for entertainment everywhere, teams are building smarter, more personalized experiences making data and AI table stakes for success.

Join us on Wednesday, June 29 at 330pm PT for our Media & Entertainment Forum, one of the most popular industry events at Data + AI Summit. During our capstone event, you'll have the opportunity to join sessions with thought leaders from some of the biggest global brands.

Featured Speakers:

Steve Sobel, Global Industry Leader, Media & Entertainment, Databricks
Duan Peng, SVP, Global Data & AI, WarnerMedia Direct-to-Consumer
Martin Ma, Group VP, Engineering, Discovery
Rafael Zambrano López, Head of Data Science, LaLiga
Bhavna Godhania, Senior Director, Strategic Partnerships, Acxiom
Michael Stuart, VP, Marketing Science, Condé Nast
Bin Mu, VP, Data and Analytics, Adobe

Communications, Media & Entertainment Breakout Sessions

Here's an overview of some of our most highly-anticipated Communications, Media & Entertainment sessions at this year's summit:

Building and Managing a Platform for 13+ PB Delta Lake and Thousands of Users — AT&T Story
Praveen Vemulapalli, AT&T

Every CIO/CDO is going through a digital transformation journey in some shape or form for agility, cost savings and competitive advantage. We all know that data is pure and factual. It can lead to a greater understanding of a business, and when translated correctly into information can provide human and business systems valuable insights to make better decisions.

The Lakehouse paradigm helps realize these benefits through adoption of the key open source technologies such as Delta Lake, Spark and MLflow that Databricks provides with enterprise features.

In this talk, walk through the cloud journey of migrating 13+ PB of Hadoop data along with thousands of user workloads. As the owner of the platform team for Chief Data Office at AT&T, Praveen will share some of the key challenges and architectural decisions made along the way for a successful Databricks deployment.

Ensuring Correct Distributed Writes to Delta Lake in Rust With Formal Verification
QP Hou, Neuralink

Rust guarantees zero memory access bugs once a program compiles. However, one can still introduce logical bugs in the implementation.

In this talk, QP will first give a high-level overview on common formal verification methods used in distributed system designs and implementations. Then, learn about how the team used TLA+ and Stateright to formally model delta-rs' multi-writer S3 back-end implementation. The end result of combining both Rust and formal verification is that they ended up with an efficient native Delta Lake implementation that is both memory safe and logical bug-free!

How AT&T Data Science Team Solved an Insurmountable Big Data Challenge on Databricks with Two Different Approaches using Photon and RAPIDS Accelerator for Apache Spark
Chris Vo, AT&T | Hao Zhu, NVIDIA

Data-driven personalization is an insurmountable challenge for AT&T's data science team because of the size of datasets and complexity of data engineering. More often, these data preparation tasks not only take several hours or days to complete, but some of these tasks fail to complete affecting productivity.

In this session, the AT&T Data Science team will talk about how RAPIDS Accelerator for Apache Spark and Photon runtime on Databricks can be leveraged to process these extremely large datasets resulting in improved content recommendation, classification, etc while reducing infrastructure costs. The team will compare speedups and costs to the regular Databricks runtime Apache Spark environment. The size of tested datasets vary from 2TB - 50TB, which consists of data collected from for 1 day to 31 days.

The talk will showcase the results from both RAPIDS accelerator for Apache Spark and Databricks Photon runtime.

Technical and Tactical Football Analysis Through Data
Rafael Zambrano, LaLiga Tech

How LaLiga uses and combines eventing and tracking data to implement novel analytics and metrics, thus helping analysts to better understand the technical and tactical aspects of their clubs. This presentation will explain the treatment of these data and its subsequent use to create metrics and analytical models.

Beyond Daily Batch Processing: Operational Trade-Offs of Microbatch, Incremental and Real-Time Processing for Your ETLs (and Your Team's Sanity)
Valerie Burchby, Netflix

Are you considering converting some batch daily pipelines to a real-time system? Perhaps restating multiple days of batch data is becoming unscalable for your pipelines. Maybe a short SLA is music to your stakeholders' ears. If you're Flink-curious or possibly just sick of pondering your late arriving data, this discussion is for you.

On the Streaming Data Science and Engineering team at Netflix, we support business-critical daily batch, hourly batch, incremental and real-time pipelines with a rotating on-call system. In this presentation, Valerie discusses the trade-offs between these systems, with an emphasis on operational support when things go sideways. Valerie will also share some learnings about "goodness of fit" per processing type amongst various workloads, with an eye for keeping your data timely and your colleagues sane.

Streaming Data Into Delta Lake With Rust and Kafka
Christian Williams, Scribd

The future of Scribd's data platform is trending towards real time. A notable challenge has been streaming data into Delta Lake in a fast, reliable and efficient manner. To help address this problem, the data team developed two foundational open source projects: delta-rs, to allow Rust to read/write Delta Lake tables, and kafka-delta-ingest, to quickly and cheaply ingest structured data from Kafka.

In this talk, Christian reviews the architecture of kafka-delta-ingest and how it fits into a larger real-time data ecosystem at Scribd.

Building Telecommunication Data Lakehouse for AI and BI at Scale
Mo Namazi, Vodafone

Vodafone AU aims to build best practices for machine learning on Cloud Platforms to adapt many different industrial needs.

This session will talk through the journey of building Lakehouse, analytics pipeline, data product and ML system for internal and external purposes. It'll also focus on how Vodafone AU practices machine learning development and operation at scale, minimises the deployment and maintenance costs, and rolls out rapid changes with adequate secure governance. More specifically, it defines a common framework cross different functional teams (such as Data Scientist, ML Engineer, DevOps Engineer, etc.) to collaboratively working on producing predictive results efficiently with managed services via reducing technical overhead within a ML system. With tools and features like Spark, MLflow, and Databricks, it becomes viable to easily adapt machine learning capability into use cases such as Customer Profiling, Call Centre Analytics, Network Analytics, etc.

Building Recommendation Systems Using Graph Neural Networks
Swamy Sriharsha, Condé Nast

RECKON (RECommendation systems using KnOwledge Networks) is a machine learning project centered around improving the entities' intelligence.

RECKON uses a GNN based encoder-decoder architecture to learn representations for important entities in their data by leveraging both their individual features and the interactions between them through repeated graph convolutions.

Personalized recommendations play an important role in improving users' experience and retaining them. Swamy will walk through some of the techniques incorporated in RECKON and an end-end building of this product on Databricks, along with the demo.

Tools for Assisted Spark Version Migrations, From 2.1 to 3.2+
Holden Karau, Netflix

This talk will look at the current state of tools to automate library and language upgrades in Python and Scala and apply them to upgrading to the new version of Apache Spark. After doing a very informal survey, it seems that many users are stuck on no longer supported versions of Spark, so this talk will expand on the first attempt at automating upgrades (2.4 -> 3.0) to explore the problem all the way back to 2.1.

Real-Time Cost Reduction Monitoring and Alerting
Ofer Ohana, Huuuge Games | David Sellam, Huuuge Games

Huuuge Games is building a state-of-the-art data and AI platform that serves as a unified data hub for all company needs and for all data and AI business insights.

They built a real-time cost monitoring infrastructure to closely monitor in real-time the cost boundaries for various dimensions, such as the technical area of the data system, specific engineering team, individual, process and more. The cost monitoring infrastructure is supported by intuitive tools for the definition of cost monitoring criteria and for the definition of real-time alerts.

In this lecture, Ofer and David will present several use cases for which their cost monitoring infrastructure enables them to detect problematic code, architecture and individual use of their infrastructure. Furthermore, they will demonstrate, thanks to this infrastructure, how they've been able to save money, facilitate the use of the Databricks platform, increase user satisfaction, and have comprehensive visibility of the data ecosystem.

What's next?

November 26, 2024/6 min read

How automated workflows are revolutionizing the manufacturing industry

December 10, 2024/9 min read

Media & Entertainment Forum

Communications, Media & Entertainment Breakout Sessions

Never miss a Databricks post

Sign up

What's next?

How automated workflows are revolutionizing the manufacturing industry

Aimpoint Digital: AI Agent Systems for Building Travel Itineraries