ホームData + AI Summit 2022 のロゴ
Watch on demand

Multimodal Deep Learning Applied to E-commerce Big Data

On Demand

Type

  • Session

フォーマット

  • Hybrid

Track

  • データサイエンス、機械学習、MLOps

Difficulty

  • Intermediate

Room

  • Moscone South | Upper Mezzanine | 155

Duration

  • 35 min
Download session slides

概要

At Mirakl, we empower marketplaces with Artificial Intelligence solutions. Catalogs data is an extremely rich source of e-commerce sellers and marketplaces products which include images, descriptions, brands, prices and attributes (for example, size, gender, material or color). Such big volumes of data are suitable for training multimodal deep learning models and present several technical Machine Learning and MLOps challenges to tackle.
We will dive deep into two key use cases: deduplication and categorization of products. For categorization the creation of quality multimodal embeddings plays a crucial role and is achieved through experimentation of transfer learning techniques on state-of-the-art models. Finding very similar or almost identical products among millions and millions can be a very difficult problem and that is where our deduplication algorithm comes to bring a fast and computationally efficient solution.
Furthermore we will show how we deal with big volumes of products using robust and efficient pipelines, Spark for distributed and parallel computing, TFRecords to stream and ingest data optimally on multiple machines avoiding memory issues, and MLflow for tracking experiments and metrics of our models.

Session Speakers

Arthur Delaitre

データサイエンティスト

Mirakl

Sang-hoon YOON

データサイエンティスト

Mirakl

Data+AI サミットの様子をご覧いただけます

Watch on demand