Writing a memory allocator - v1

In v0, we made the simplifying assumption that all the data we work with are fixed-size blocks. Now, we’re going to relax the assumption: data can now be requested in any size.

Our simplifying assumptions meant that we didn’t have to track how much free space is available; we could just iterate over the blocks until we found one that was marked as empty.

The problem with variable sizes is that now, we need to do some bookkeeping. When we request memory, our memory allocator needs to know:

where the available blocks of memory are, and
if those available blocks have enough memory to support the user’s request.

But where do we keep the bookkeeping information? With variable sizes, the bookkeeping information will be variable also - which means that we can’t put it on the stack. We have to embed the bookkeeping information into the heap itself.

So that’s allocation. What about deallocation? It would be useful to know how much space was allocated, so that when the time for deallocation comes - we can just do deallocate the next x bytes.

Hence we arrive at the following structs:

struct FreeBlockHeader {
    size_t size; // size of free space
    FreeBlockHeader* next; // location of next free block
};

struct AllocBlockHeader {
    size_t size; // size of allocated space
};

For a free block of memory, we embed the FreeBlockHeader immediately before the actual free bytes.

+------+------+-------------------------------------------+
| size | next | free space                                |
+------+------+-------------------------------------------+

For an allocated block of memory, we follow a similar structure:

+------+-------------------------------------------+
| size | allocated space                           |
+------+-------------------------------------------+
       ^
    pointer returned from malloc()

We can then imagine the total memory area as fragmented into (free-header + free-space) and (alloc-header + alloc-space). The job of the memory allocator is then (a) to be the bookkeeper of free/allocated space, and (b) to find the most suitable free space for the requested memory.

What makes a free block “suitable”? Let’s first consider the simpler inverse: what makes a free block unsuitable? There’s a dumb answer: a free block that doesn’t have enough space for the requested number of bytes is unsuitable.

The question of suitability outright is harder to define. Is the free block that has just enough space to fit the requested number of bytes more or less suitable than the free block that has the most amount of space? To make things concrete: say you request 16 bytes, and there are two free blocks:

a block with 32 bytes of free space
a block with 4096 bytes of free space

Would you rather pick block (1) or block (2)? It very much seems like an arbitrary choice.

v1: the second-most crude version possible

For v1, I chose to go with the first-fit allocation algorithm. First-fit allocation returns the first free block that can support the requested memory amount. Notice that we need to account for the size of AllocBlockHeader because this metadata is directly embedded into the free memory space.

void* malloc(Arena& a, size_t size) {
    if (size <= 0) return nullptr;
    
    FreeBlockHeader* current = a.free_list;
    do {
	    // need to consider size of AllocBlockHeader
        if (current->size > size + sizeof(AllocBlockHeader)) {
            current->size = current->size - (size + sizeof(AllocBlockHeader));
            AllocBlockHeader* new_alloc_block_ptr = reinterpret_cast<AllocBlockHeader*>(reinterpret_cast<std::byte*>(current) + sizeof(FreeBlockHeader) + current->size);
            new_alloc_block_ptr->size = size;
            
            return reinterpret_cast<void*>(reinterpret_cast<std::byte*>(new_alloc_block_ptr) + sizeof(AllocBlockHeader));
        }
        
        current = current->next;
    } while (current != nullptr);
    
    throw std::runtime_error("out of memory"); // I don't handle resizing
}

De-allocation is just simple bookkeeping… sort of. On the surface, all we do is convert the AllocBlockHeader to FreeBlockHeader. But what is that coalesce() that we do at the end?

void free(Arena& a, void* ptr) {
    AllocBlockHeader* block_ptr = reinterpret_cast<AllocBlockHeader*>(static_cast<std::byte*>(ptr) - sizeof(AllocBlockHeader));
    size_t new_free_size = (block_ptr->size + sizeof(AllocBlockHeader)) - sizeof(FreeBlockHeader); // need to leave space for `next`
    
    FreeBlockHeader* new_free_ptr = reinterpret_cast<FreeBlockHeader*>(block_ptr);
    new_free_ptr->size = new_free_size;
    new_free_ptr->next = a.free_list;
    
    a.free_list = new_free_ptr;
    coalesce(a); // what is this? 
}

As the programmer allocates and de-allocates memory, the total memory block gets increasingly “fragmented” without additional cleanup work. Consider the following sequence:

allocate 1 byte at position 0.
allocate 1 byte at position 1.
allocate 1 byte at position 2.
allocate 1 byte at position 3.
de-allocate the 1 byte at position 1.
de-allocate the 1 byte at position 3.
de-allocate the 1 byte at position 2.

If we free without coalesce, then we will view blocks 1, 2, and 3 as three separate free blocks (each of size one) - when in reality, there’s just one free block of size three. Hence we have to do some cleanup work to coalesce and merge back together the seemingly fragmented free blocks that neighbor each other:

void coalesce(Arena& a) {
    while (true) {
        FreeBlockHeader* prev = nullptr;
        FreeBlockHeader* current = a.free_list;
        
        bool dirty = false;
        while (current != nullptr) {
            std::byte* target_location = reinterpret_cast<std::byte*>(a.free_list);
            std::byte* current_location = reinterpret_cast<std::byte*>(current);
            
            if (current == reinterpret_cast<FreeBlockHeader*>(target_location + sizeof(FreeBlockHeader) + a.free_list->size)) {
                a.free_list->size += sizeof(FreeBlockHeader) + current->size;
                prev->next = current->next;
                
                dirty = true;
                break;
            } else if (a.free_list == reinterpret_cast<FreeBlockHeader*>(current_location + sizeof(FreeBlockHeader) + current->size)) {
                current->size += sizeof(FreeBlockHeader) + a.free_list->size;
                a.free_list = a.free_list->next;
                
                dirty = true;
                break;
            }
            
            prev = current;
            current = current->next;
        }
        
        if (!dirty) {
            return;
        }
    }
}

Let’s return to the two questions to judge our v1 allocator.

How do we allocate/de-allocate memory? By popping and pushing onto a linked list that keeps track of free blocks. Conceptually similar to the v0 allocator, but very different in implementation.
What makes a good allocation algorithm? What makes a bad one? Using our used memory / allocated memory metric , our v1 allocator can’t be expected to be perform at 100% utilization because we don’t have the fixed-size guarantees from the v0 allocator problem.

Next week, we’ll make improvements to our v1 allocator (in particular, exploring how to make freeing more optimized), and study how glibc’s allocator works.

v1: the second-most crude version possible#

v1: the second-most crude version possible