Neil Conway

Co-Founder and CTO, Determined AI

Neil Conway is co-founder and CTO of Determined AI, a startup that builds software to dramatically accelerate deep learning model development. Neil was previously a technical lead at Mesosphere and a major developer of both Apache Mesos and PostgreSQL. Neil holds a PhD in Computer Science from UC Berkeley, where he did research on large-scale data management, distributed systems, and programming languages.

Past sessions

Summit 2020 Deep Learning at Scale with Apache Spark and Determined

June 25, 2020 05:00 PM PT

Despite its enormous potential to enable new applications, deep learning remains prohibitively expensive, difficult, and time-consuming for the vast majority of companies. Training DL models at scale is particularly challenging: training a single model can take days or weeks, and DL engineers are often forced to spend much of their time doing DevOps or writing boilerplate code to handle routine tasks like data loading, distributed training, or fault tolerance.

In this talk, we introduce Determined, an open source platform that enables deep learning teams to train models more quickly, easily share GPU resources, and effectively collaborate. This talk will include an overview of the problems that Determined aims to solve, the high-level architecture of the system, and show how Determined and Spark can be used together effectively. We’ll also dive deep on some key technical features, such as:

  • Distributed training without changing your model code
  • Intelligent hyperparameter search
  • Flexible GPU scheduling, including automatic management of cloud GPU instances
  • Automatic fault tolerance and checkpoint management
  • Seamless integration into the Spark ecosystem, e.g., for performing ETL or model inference.