Session

Unlocking Video Data at Scale: VLM Batch Inference with Ray on Databricks

Overview

Experience	In Person
Track	Artificial Intelligence & Agents
Industry	Enterprise Technology, Communications, Media & Entertainment, Transportation
Technologies	Unity Catalog
Skill Level	Advanced

Organisations sit on vast video archives (surveillance, manufacturing, inspections), yet lack scalable methods to extract insights. This session presents a production-ready architecture for distributed video analytics using Vision Language Models (VLMs) on Databricks.We'll walk through a three-stage accelerator: (1) video ingestion into Unity Catalog Volumes, (2) VLM registration with MLflow for reproducibility, and (3) distributed batch inference using Ray and VLLM with Qwen2.5-VL-32B. You'll see how Ray orchestrates GPU-accelerated inference across video datasets and how to extract structured entities from VLM outputs using Databricks AI Functions.Attendees will leave with:A reusable pattern for multi-modal video intelligence at scaleWorking code integrating Ray, VLLM, and Unity CatalogPrompt engineering techniques for video inputsCost and performance considerations for VLM workloadsWhether your use case is retail, manufacturing, or public safety—this pattern applies.

Session Speakers

Samantha Wise

/Senior Specialist Solutions Engineer
Databricks