Skip to main content
Company Blog
Databricks and Summit Gold Sponsor AWS Present on a wide variety of topics at this year’s premier data and AI event.

Amazon Web Services (AWS) is sponsoring Data + AI Summit Europe 2020 and our work with AWS continues to make Databricks better integrated with other AWS services, making it easier for our customers to drive huge analytics outcomes.

As part of Data + AI Summit, we want to highlight some of the top sessions of interest for AWS customers. The sessions below are relevant to customers interested in or using Databricks on the AWS cloud platform, demonstrating key service integrations. If you have questions about your AWS platform or service integrations, visit the AWS booth at Data + AI Summit.

Building a Cloud Data Lake with Databricks and AWS

How are customers building enterprise data lakes on AWS with Databricks? Learn how
Databricks complements the AWS data lake strategy and how Databricks integrates with
numerous AWS Data Analytics services such as Amazon Athena and AWS Glue.

Moving to Databricks & Delta builds analytical B2B data products that heavily use Spark and AWS technologies for data processing and analytics. In this session Carsten Herbe will explain why moved from AWS EMR to Databricks and Delta, and share their experiences from different angles like architecture, application logic and user experience. The session will cover how security, cluster configuration, resource consumption and workflow changed by using Databricks clusters as well as how using Delta tables simplified application logic and data operations.
From a data scientist/engineers perspective Carsten will show how daily analytical and development work has improved. Many of these points can also be applied when moving from some other Spark platform like Hadoop to Databricks

Speaker: Carsten Herbe, GmbH

Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS SageMaker for Enterprise AI Scenarios

Transformer-based pre-trained language models such as BERT, XLNet, Roberta and Albert significantly advance the state-of-the-art of NLP and open doors for solving practical business problems with high-performance transfer learning. However, operationalizing these models with production-quality continuous integration/ delivery (CI/CD) end-to-end pipelines that cover the full machine learning life cycle stages of train, test, deploy and serve while managing associated data and code repositories is still a challenging task. In this presentation, the Outreach team will demonstrate how we use MLflow and AWS Sagemaker to productionize deep transformer-based NLP models for guided sales engagement scenarios at the leading sales engagement platform,

Outreach will share their experiences and lessons learned in the following areas:

  • A publishing/consuming framework to effectively manage and coordinate data, models and artifacts (e.g., vocabulary file) at different machine learning stages
  • A new MLflow model flavor that supports deep transformer models for logging and loading the models at different stages
  • A design pattern to decouple model logic from deployment configurations and model customizations for a production scenario using MLProject entry points: train, test, wrap, deploy.
  • A CI/CD pipeline that provides continuous integration and delivery of models into a Sagemaker endpoint to serve the production usage

This session will be of great interest to a broad business community who are actively working on enterprise AI scenarios and digital transformation.

Speakers: Yong Liu, and Andrew Brooks,

From Hadoop to Delta Lake and Glue for Streaming and Batch

The modern data customer wants data now. Batch workloads are not going anywhere, but at Scribd the future of our data platform requires more and more streaming data sets. As such our new data platform built around AWS, Delta Lake, and Databricks must simultaneously support hundreds of batch workloads, in addition to dozens of new data streams, stream processing, and stream/ad-hoc workloads.
In this session we will share the progress of our transition into a streaming cloud-based data platform, and how some key technology decisions like adopting Delta Lake have unlocked previously unknown capabilities our internal customers enjoy. In the process, we’ll share some of the pitfalls and caveats from what we have learned along the way, which will help your organization adopt more data streams in the future.

Speakers: R Tyler Croy, Scribd

Join Us!

We look forward to connecting with you at Data + AI Summit Europe 2020! If you have questions about Databricks running on AWS, please visit the AWS virtual booth at Data + AI Summit.

For more information about Databricks on AWS including customer case studies and integration details, go to