Best Practices for Data Prep for GenAI Development
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Lightning Talk |
TRACK | Generative AI |
TECHNOLOGIES | Databricks Experience (DBX), Apache Spark, ETL, GenAI/LLMs |
SKILL LEVEL | Intermediate |
DURATION | 20 min |
DOWNLOAD SESSION SLIDES |
In this session, we will explore the best practices for data preparation for generative AI development. Data preparation is a critical step in the development of generative AI models, as the quality and relevance of the data used for training directly impact the performance and accuracy of the model. We will discuss the importance of data quality, data diversity, and data labeling in the context of generative AI development. We will also cover techniques for data preprocessing, such as data cleaning, normalization, and transformation, and how to optimize these techniques for generative AI models. We will also provide practical tips and guidelines for implementing these best practices in real-world generative AI development projects. Whether you are a data scientist, machine learning engineer, or AI researcher, this session will provide valuable insights and practical guidance for optimizing data preparation for generative AI development.
SESSION SPEAKERS
Brian Kihoon Lee
/Senior Software Engineer
Databricks