Skip to main content
Company Blog

Looking for the best Media and Entertainment (M&E) events and sessions at Data + AI Summit Europe 2020 (Nov 17-19) ? Below are some highlights. You can also find all M&E-related sessions, including customer case studies and extensive how-tos, within the event homepage by selecting “Media and Entertainment” from the “Industry” dropdown menu. You can still register for this free, virtual event here.

Learn more about the Media and Entertainment talks, training and events featured at the Data + AI 2020 Europe Virtual Summit.

For Business Leaders

Winning the Battle for Consumer Attention with Data + AI
Media, broadcasting and gaming companies are in a fierce battle for audience attention for their direct to consumer businesses and the advertising ecosystem is under more pressure than ever before to drive performance based outcomes. The need to personalise the consumer experience is paramount to keeping audiences engaged and driving effective ad targeting solutions. Predictive analytics and real-time data science use cases can help media and entertainment companies increase engagement, reduce churn and maximise customer lifetime value. Join us as we discuss best practices and real-world machine learning use cases in the publishing, streaming video and gaming space as industry leaders move aggressively to personalize, monetize and drive agility around the consumer and advertiser experience.


  • Steve Sobel, M&E GTM Lead, Databricks


  • Steve Layland, Director, Engineering, Tubi
  • Arthur Gola de Paula, Manager, Data Science, Wildlife Studios
  • Krish Kuruppath, SVP, Global Head of AI Platform, Publicis Media-COSMOS

(Kaizen Gaming) Personalization Journey: From single node to Cloud Streaming

In the online gaming industry we receive a vast amount of transactions that need to be handled in real time. Our customers get to choose from hundreds or even thousand options, and providing a seamless experience is crucial in our industry. Recommendation systems can be the answer in such cases but require handling loads of data and need to utilize large amounts of processing power. Towards this goal, in the last two years we have taken down the road of machine learning and AI in order to transform our customer's daily experience and upgrade our internal services.

In this long journey we have used the Databricks on Azure Cloud to distribute our workloads and get the processing power flexibility that is needed along with the stack that empowered us to move forward. By using MLflow we are able to track experiments and model deployment, by using Spark Streaming and Kafka we moved from batch processing to Streaming and finally by using Delta Lake we were able to bring reliability in our Data Lake and assure data quality. In our talk we will share our transformation steps, the significant challenges we faced and insights gained from this process.
Click here to see all M&E-related customer stories.

(Wildlife Studios) Using Machine Learning at Scale: A Gaming Industry Experience!

Games earn more money than movies and music combined. That means a lot of data is generated as well. One of the development considerations for ML Pipeline is that it must be easy to use, maintain, and integrate. However, it doesn’t necessarily have to be developed from scratch. By using well-known libraries/frameworks and choice of efficient tools whenever possible, we can avoid “reinventing the wheel”, making it flexible and extensible.

Moreover, a fully automated ML pipeline must be reproducible at any point in time for any model which allows for faster development and easy ways to debug/test each step of the model. This session walks through how to develop a fully automated and scalable Machine Learning pipeline by the example from an innovative gaming company whose games are played by millions of people every day, meaning data growth within terabytes that can be used to produce great products and generate insights on improving the product.

Wildlife leverages data to drive product development lifecycle and deploys data science to drive core product decisions and features, which helps the company by keeping ahead of the market. We will also cover one of the use cases which is improving user acquisition through improved LTV models and the use of Apache Spark. Spark’s distributed computing enabled Data Scientists to run more models in parallel and they can innovate faster by onboarding more Machine Learning use cases. For example, using Spark allowed the company to have around 30 models for different kinds of tasks in production.

For Practitioners

Understanding advertising effectiveness with advanced sales forecasting & attribution, followed by AMA.

How do you connect the effectiveness of your ad spend towards driving sales? Introducing the Sales Forecasting and Advertising Attribution Solution Accelerator. Whether you’re an ad agency or in-house marketing analytics team, this solution accelerator allows you to easily incorporate campaign data from a variety of historical and current sources -- whether streaming digital or batch TV, OOH, print, and direct mail -- to see how these drive sales at a local level and forecast future performance. Normally attribution can be a fairly expensive process, particularly when running attribution against constantly updating datasets. This session will demonstrate how Databricks facilitates the multi-stage Delta Lake transformation, machine learning, and visualization of campaign data to provide actionable insights on a daily basis.

Afterwards, M&E specialist SA Layla Yang will be available to answer questions about this solution or any other media, ad tech, or marketing analytics questions you may have.

(MIQ Digital India Pvt Ltd.) Building Identity Graph at scale for Programmatic Media Buying using Spark and Delta Lake

The proliferation of digital channels has made it mandatory for marketers to understand an individual across multiple touchpoints. In order to develop market effectiveness, marketers need have a pretty good sense of its consumer's identity so that it can reach him via mobile device, desktop or a big TV screen on living room. Examples of such identity tokens include cookies, app IDs etc.A consumer can use multiple devices at the same time and so the same consumer should not be treated as different people in the advertising space. The idea of identity resolution comes with this mission and goal to have an omnichannel view of a consumer.

Identity Spine is MIQ's proprietary identity graph, using identity signals across our ecosystem to create a unified source of reference to be consumed by product, business analysis and solutions teams for insights and activation. We have been able to build a strong data pipeline using Spark and Delta Lake, thereby strengthening our connected media products offerings for cross channel insights and activation.

This talk mostly highlights :

  • The journey of building a scalable data pipeline that handles 10TB+ of data daily
  • How we were able to save our processing cost by 50%
  • Optimization strategies implemented to onboard new dataset to enrich the graph

(Roularta) Building an ML Tool to predict Article Quality Scores using Delta & MLFlow

For Roularta, a news & media publishing company, it is of a great importance to understand reader behavior and what content attracts, engages and converts readers. At Roularta, we have built an AI-driven article quality scoring solution on using Spark for parallelized compute, Delta for efficient data lake use, BERT for NLP and MLflow for model management. The article quality score solution is an NLP-based ML model which gives for every article published – a calculated and forecasted article quality score based on 3 dimensions (conversion, traffic and engagement).

The score helps editorial and data teams to make data-driven article-decisions such as launching another social post, posting an article behind the paywall and/or top-listing the article on the homepage.
The article quality score gives editorial a quantitative base for writing more impactful articles and running a better news desk. In this talk, we will cover how this article quality score tool works including:

  • The role of Delta to accelerate the data ingestion and feature engineering pipelines
  • The use of the NLP BERT language model (Dutch based) for extracting features from the articles text in a Spark environment
  • The use of MLflow for experiments tracking and model management
  • The use of MLflow to serve model as REST endpoint within Databricks in order to score newly published articles

Looking forward to seeing you at the Data + AI Summit 2020.