HomepageData + AI Summit 2023 Logo
  • Sessions
  • Overview
  • Why Attend
  • Training
Register Now

Large Language Models (LLM)

Register now

Deep dive into all things LLM - from in-person day long trainings, to community sessions focused on LLM tooling and Responsible use, to dedicated sessions on Databricks open-source model Dolly, LLMOps and more.


Building and Deploying Large Language Models on Databricks | Paid Training

Large Language Models like ChatGPT and Dolly have taken the world by storm. Everybody from researchers and engineers to business leaders is rushing to understand how to utilize the latest developments in large language models (LLMs). This course is designed to teach individuals how to leverage LLMs for real-world use cases, including popular topics such as Transformers, BERT, and GPT. Through interactive lectures and exercises, you will learn how to develop, implement, evaluate, and deploy LLMs on Databricks.

View this session in the catalog

Testing Generative AI Models: What You Need to Know

Generative AI shows incredible promise for enterprise applications. The explosion of generative AI can be attributed to the convergence of several factors. Most significant is that the barrier to entry has dropped for AI application developers through customizable prompts (few-shot learning), enabling laypeople to generate high-quality content. The flexibility of models like ChatGPT and DALLE-2 have sparked curiosity and creativity about new applications that they can support. The number of tools will continue to grow in a manner similar to how AWS fueled app development. But excitement must be tampered by concerns about new risks imposed to business and society. Increased capability and adoption also increase risk exposure. As organizations explore creative boundaries of generative models, measures to reduce risk must be put in place. However, the enormous size of the input space and inherent complexity make this task more challenging than traditional ML models.

In this session, we summarize the new risks introduced by the new class of generative foundation models through several examples, and compare how these risks relate to the risks of mainstream discriminative models. Steps can be taken to reduce the operational risk, bias and fairness issues, and privacy and security of systems that leverage LLM for automation. We’ll explore model hallucinations, output evaluation, output bias, prompt injection, data leakage, stochasticity, and more. We’ll discuss some of the larger issues common to LLMs and show how to test for them. A comprehensive, test-based approach to generative AI development will help instill model integrity by proactively mitigating failure and the associated business risk.

Yaron Singer, CEO & co-Founder, Robust Intelligence

View this session in the catalog

Rapidly Scaling Applied AI/ML with Foundational Models and Applying Them to Modern AI/ML Use Cases

Today many of us are familiar with foundational models such as LLM/ChatGPTl. However, there are many more enterprise foundational models that can be rapidly deployed, trained and applied to enterprise use cases. This approach dramatically increases the performance of AI/ML models in production, but also gives AI teams rapid roadmaps for efficiency and delivering value to the business. Databricks provides the ideal toolset to enable this approach. In this session, we will provide a logically overview of foundational models available today, demonstrate a real-world use case, and provide a business framework for data scientists and business leaders to collaborate to rapidly deploy these use cases.

Nick King, Founder & CEO, Data Kinetic

View this session in the catalog

How You Can Audit A Language Model

Language models like ChatGPT are incredible research breakthroughs but require auditing & risk management before productization. These systems raise concerns related to toxicity, transparency & reproducibility, intellectual property licensing & ownership, dis- & misinformation, supply chains & significant carbon footprints. How can your org. leverage these new tools without taking on undue or unknown risks?

Recent public reference work from In-Q-Tel Labs & BNH.AI details an audit of a named entity recognition (NER) application based on the pre-trained language model RoBERTa. If you have a language model use case in mind & want to understand your risks, this presentation will cover:

  • Studying past incidents using the AI Incident Database and using this information to guide debugging.
  • Finding & fixing common data quality issues.
  • Applying general public tools & benchmarks as appropriate (e.g., checklist, SuperGLUE, HELM).
  • Binarizing specific tasks & debugging them using traditional model assessment and bias testing.
  • Constructing adversarial attacks based on a model's most significant risks and analyzing the results in terms of performance, sentiment & toxicity.
  • Testing performance, sentiment & toxicity across different & less common languages.
  • Conducting random attacks: random sequences of attacks, prompts or other tests that may evoke unexpected responses.
  • Don't forget about security: auditing code for backdoors & training data for poisoning, ensuring endpoints are protected with authentication & throttling and analyzing third-party dependencies.
  • Engaging stakeholders to help find problems system designers & developers cannot see.

It's now time to figure out how to live with AI, and that means audits, risk management & regulation.

Patrick Hall, Principal Scientist, BNH.AI

View this session in the catalog

Explainable Data Drift for NLP

Detecting data drift, although far from solved for Tabular data, has become a common practice as a way to monitor ML models in production. For Natural Language Processing on the other hand the question remains mostly open. In this talk, we will present and compare two approaches. First, we will demonstrate how by extracting a wide range of explainable properties per document such as topics, language, sentiment, named entities, keywords and more we are able to explore potential sources of drift. We will show how these properties can be consistently tracked over time, how they can be used to detect meaningful Data Drift as soon as it occurs and how they can be used to explain and fix the root cause. The second approach we’ll present is to detect drift by using the embeddings of common foundation models (such as OpenAI’s GPT3 model family) and use them to identify areas in the embedding space in which significant drift has occurred. These areas in embedding space should then be characterized in a human-readable way to enable root cause analysis of the detected drift. We’ll then compare the performance and explainability of these two methods, and explore the pros and cons of using each approach.

Noam Bressler, ML Team Lead, Deepchecks

View this session in the catalog

Building AI-powered Products with Foundation Models

Foundation models make for fantastic demos, but in practice, they can be challenging to put into production. These models work well over datasets that match common training distributions (e.g. generating web text or internet images), but may fail on domain-specific tasks or long-tail edge cases—the settings that matter most to organizations building differentiated products! We propose a data-centric development approach that organizations can use to adapt foundation models to their own private/proprietary datasets. We'll describe several techniques, including supervision "warmstarts" and interactive prompting (spoiler alert: no code needed!). To make these techniques come to life, we'll walk through real case studies describing how we've seen data-centric development drive AI-powered products, from "AI assist" use cases (e.g. copywriting assistants) to "fully automated" solutions (e.g. loan processing engines).

Vincent Chen, Director of Product / Founding Engineer, Snorkel AI

View this session in the catalog

Colossal-AI: Scaling AI Models in Big Model Era

The proliferation of large models based on Transformer has outpaced advances in hardware, resulting in an urgent need for the ability to distribute enormous models across multiple GPUs. Despite this growing demand, best practices for choosing an optimal strategy are still lacking due to the breadth of knowledge required across HPC, DL, and distributed systems. These difficulties have stimulated both AI and HPC developers to explore the key questions: 

  • How can training and inference efficiency of large models be improved to reduce costs?
  • How can larger AI models be accommodated even with limited resources? 
  • What can be done to enable more community members to easily access large models and large-scale applications? 

In this presentation, we investigated efforts to solve the questions mentioned above. Firstly, diverse parallelization is an important tool to improve the efficiency of large model training and inference. Heterogeneous memory management can help enhance the model accommodation capacity of processors like GPUs. Furthermore, user-friendly DL systems for large models significantly reduce the specialized background knowledge users need, allowing more community members to get started with larger models more efficiently. We will provide participants with a system-level open-source solution, Colossal-AI. More information can be found at https://github.com/hpcaitech/ColossalAI.

James Demmel, Dr. Richard Carl Dehmel Distinguished Professor, University of California, Berkeley; Yang You, Presidential Young Professor, National University of Singapore

View this session in the catalog

Generative AI at scale using GAN and Stable Diffusion

Generative AI is under the spotlight and it has diverse applications but there are also many considerations when deploying a generative model at scale. This presentation will make a deep dive into multiple architectures and talk about optimization hacks for the sophisticated data pipelines that generative AI requires. 

The presentation will cover:

  • How to create and prepare a dataset for training at scale in single GPU and multi GPU environments
  • How to optimize your data pipeline for training and inference in production considering the complex deep learning models that need to be run.
  • Tradeoff between higher quality outputs versus training time and resources and processing times.

Paula Martinez, CEO & Cofounder, Marvik; Rodrigo Beceiro, CTO & Cofounder, Marvik

View this session in the catalog