HomepageData + AI Summit 2023 Logo
JUNE 26-29, 2023
Attend Live

Multimodal Deep Learning Applied to E-commerce Big Data

On Demand


  • Session


  • Hybrid


  • Data Science, Machine Learning and MLOps


  • Intermediate


  • Moscone South | Upper Mezzanine | 155


  • 35 min
Download session slides


At Mirakl, we empower marketplaces with Artificial Intelligence solutions. Catalogs data is an extremely rich source of e-commerce sellers and marketplaces products which include images, descriptions, brands, prices and attributes (for example, size, gender, material or color). Such big volumes of data are suitable for training multimodal deep learning models and present several technical Machine Learning and MLOps challenges to tackle.
We will dive deep into two key use cases: deduplication and categorization of products. For categorization the creation of quality multimodal embeddings plays a crucial role and is achieved through experimentation of transfer learning techniques on state-of-the-art models. Finding very similar or almost identical products among millions and millions can be a very difficult problem and that is where our deduplication algorithm comes to bring a fast and computationally efficient solution.
Furthermore we will show how we deal with big volumes of products using robust and efficient pipelines, Spark for distributed and parallel computing, TFRecords to stream and ingest data optimally on multiple machines avoiding memory issues, and MLflow for tracking experiments and metrics of our models.

Session Speakers

Headshot of Arthur Delaitre

Arthur Delaitre

Data Scientist


Headshot of Sang-hoon YOON

Sang-hoon YOON

Data Scientist


See the best of Data+AI Summit

Watch on demand