When training artificial intelligence (AI) and machine learning (ML) models for a specific purpose, data scientists and engineers have found it easier and less expensive to modify existing pretrained foundation large language models (LLMs) than it is to train new models from scratch. A foundation large language model is a powerful, general-purpose AI trained on vast datasets to understand and generate human-like text across a broad range of topics and tasks.
The ability to leverage the deep learning of existing models can reduce the amount of compute power and orchestrated data needed to tailor a model for specific use cases.
Fine-tuning is the process of adapting or supplementing pretrained models by training them on smaller, task-specific datasets. It has become an essential part of the LLM development cycle, allowing the raw linguistic capabilities of base foundation models to be adapted for a variety of use cases.
How fine-tuning LLMs works
Pretrained large language models are trained on enormous amounts of data to make them good at understanding natural language and generating a human-like response to the input, making them a natural place to start for a base model.
Fine-tuning these models improves their ability to perform specific tasks, such as sentiment analysis, question answering or document summarization, with higher accuracy. Third-party LLMs are available, but fine-tuning models with an organization’s own data offers domain-specific results.
The importance and benefits of fine-tuning
Fine-tuning connects the intelligence in general-purpose LLMs to enterprise data, enabling organizations to adapt generative AI (GenAI) models to their unique business needs with higher degrees of specificity and relevance. Even small companies can build customized models suited to their needs and budgets.
Fine-tuning significantly reduces the need to invest in costly infrastructure for training models from scratch. By fine-tuning pretrained models, organizations can achieve faster time to market with reduced inference latency, as the model is more efficiently adapted to specific use cases.
Fine-tuning techniques help reduce memory usage and speed up the training process for foundational models with specialized, domain-specific knowledge, saving labor and resources.
When you fine-tune a language model on your proprietary data on Databricks, your unique datasets are not exposed to third-party risks associated with general model training environments.
Types of fine-tuning
Fine-tuning can help improve the accuracy and relevance of a model’s outputs, making them more effective in specialized applications than the broadly trained foundation models. It tries to adapt the model to understand and generate text that is specific to a particular domain or industry. The model is fine-tuned on a dataset composed of text from the target domain to improve its context and knowledge of domain-specific tasks. The process can be very resource-intensive, but new techniques make fine-tuning much more efficient. The following are some of the ways organizations fine-tune their LLMs:
Parameter-efficient fine-tuning
Parameter-efficient fine-tuning (PEFT) is a suite of techniques designed to adapt large pretrained models to specific tasks while minimizing computational resources and storage requirements. This approach is beneficial for applications with limited resources or those requiring multiple fine-tuning tasks. PEFT methods, such as low-rank adaptation (LoRA) and adapter-based fine-tuning, work by introducing a small number of trainable parameters instead of updating the entire model. Adapter layers, a key component of PEFT, are lightweight, trainable models inserted into each layer of a pretrained model.
These adapters, which come in variants like Sequential, Residual and Parallel, adjust the model’s output without altering the original weights, thus preserving them while allowing for task-specific adjustments. For instance, LoRA can efficiently fine-tune large language models for tasks such as generating product descriptions. Meanwhile, quantized low-rank adaptation (QLoRA) focuses on reducing memory and computational load by using quantization. QLoRA optimizes memory with quantized low-rank matrices, which makes it highly efficient for tasks where hardware resources are limited.
Fine-tuning gives the model a more focused dataset such as industry-specific terminology or task-focused interactions. This helps the model generate more relevant responses for the use case, which could be anything from customizing to supplementing the model’s core knowledge to extending the model to entirely new tasks and domains.
LLMs also can be fine-tuned to address specific industry applications, such as in healthcare where fine-tuning on proprietary medical data can result in more accurate diagnosis and treatments. Likewise, in finance applications, fine-tuned models can be taught to detect fraud by analyzing transaction data and customer behavior.
LLMs are machine learning models that perform language-related tasks such as translation, answering questions, chat, content summarization and content and code generation. LLMs distill value from huge datasets and make that “learning” accessible out of the box. This “transfer learning” process uses pretrained models to compute features for use in other downstream models to significantly reduce the time required to train and tune a new model. See Featurization for Transfer Learning for more information and an example.
When not to fine-tune
To avoid any potential model “over-fitting,” refrain from adding or fine-tuning tasks that are too similar to tasks in the pretrained model as it could lose its ability to generalize from the original datasets. Expanding the training datasets can increase the accuracy of the model.
Work continues to democratize generative AI by reducing the reliance on large compute resources and making it easier to reliably customize LLM deployments. Fine-tuning LLMs at scale requires more automated, intelligent tools to further reduce that reliance.
Advancements like LoRA streamline the process, paving the way for more intelligent tools that can access external sources to validate in real time to cross-check model output and self-improve its performance.
Further integration may produce LLMs that can generate their own training datasets by creating questions and fine-tuning based on the curated answers. This makes it easier to integrate fine-tuned LLMs into an enterprise workflow and enhance business operations.
In many use cases, AI models today perform at or near human-level accuracy, but concerns continue around ethical AI and bias in the development of LLMs, meaning providers must remain dedicated to ensuring responsible and fair AI practices.
When you train LLMs for specific tasks, industries or datasets, you broaden the capabilities of these generalized models. A unified service for training, deploying, governing, querying and monitoring models lets you manage all models in one place and query them with a single API, delivering cost-effective efficiency, accuracy and sustainability.
Looking forward, advances in multimodal fine-tuning are pushing the boundaries of what AI models can do, enabling them to integrate multiple data types — such as images, text and speech — into a single, fine-tuned solution. As fine-tuned AI models become more precise, efficient and scalable, expect them to become more integral in business operations and drive further adoption across all sectors.
