Session

Unlocking Video Data at Scale: VLM Batch Inference with Ray on Databricks

Overview

ExperienceIn Person
TrackArtificial Intelligence & Agents
IndustryEnterprise Technology, Communications, Media & Entertainment, Transportation
TechnologiesUnity Catalog
Skill LevelAdvanced
Organisations sit on vast video archives (surveillance, manufacturing, inspections), yet lack scalable methods to extract insights. This session presents a production-ready architecture for distributed video analytics using Vision Language Models (VLMs) on Databricks.We'll walk through a three-stage accelerator: (1) video ingestion into Unity Catalog Volumes, (2) VLM registration with MLflow for reproducibility, and (3) distributed batch inference using Ray and VLLM with Qwen2.5-VL-32B. You'll see how Ray orchestrates GPU-accelerated inference across video datasets and how to extract structured entities from VLM outputs using Databricks AI Functions.Attendees will leave with:A reusable pattern for multi-modal video intelligence at scaleWorking code integrating Ray, VLLM, and Unity CatalogPrompt engineering techniques for video inputsCost and performance considerations for VLM workloadsWhether your use case is retail, manufacturing, or public safety—this pattern applies.

Session Speakers

Samantha Wise

/Senior Specialist Solutions Engineer
Databricks