For the next couple weeks, we’re going to write a memory allocator. I’m dumb, so I need to start from dead-simple problem constraints and work my way up to increasingly complex constraints. As such, we’ll write this allocator in stages.
What is the memory allocation problem?
Every process gets allocated memory - but some memory demands are more dynamic than others. This means that the programmer must bear responsibility over memory allocation.
The pertinent questions are:
- How do we allocate/de-allocate memory?
- What makes a good allocation algorithm? What makes a bad one?
We’ll strive to answer these questions in each iteration of our allocator.
v0: the crudest version possible
The first memory model that I’ve read about in textbooks and articles is summarized by this diagram:
+----------------------+
| Code |
+----------------------+
| Data |
+----------------------+
| Heap |
| |
+----------------------+
| (available memory) |
+----------------------+
| Stack |
+----------------------+
The code is stored at the lowest memory address, followed by initialized and uninitialized data, the heap, a chunk of available memory for use by the heap or the stack, and finally, the stack. So the problem is to utilize the free space between the heap and the stack as effectively as possible.
Concretely, what does “effective” mean? For the scope of v0, it means minimizing the amount of unused memory- or maximizing the ratio of used memory to total allocated memory. The way we go about optimizing this metric depends on the shape of the problem for which the allocator will be used.
Let’s make the assumption that we only work with data that is 16 bytes long. Then, we can treat the heap as a collection of (16-byte + 1-byte metadata) sized elements:
struct Block {
bool free;
std::byte data[16];
}
struct Arena {
Block* blocks = nullptr; // for debugging
Block* freehead = nullptr;
size_t count = 0;
~Arena() { if (count) munmap(blocks, count * sizeof(Block*));}
};
Arena make_arena(size_t n) {
size_t bytes = n * sizeof(Block);
size_t page_size = sysconf(_SC_PAGESIZE);
bytes = (bytes + page_size - 1) & ~(page_size - 1);
// Use mmap to ask for chunk of memory. We could've also used
// sbrk() or brk(), which are more "pure" ways of asking for memory
// from the free space between the stack and the heap.
void* raw = mmap(nullptr, bytes, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (raw == MAP_FAILED) throw std::runtime_error("mmap failed");
Block* blocks = static_cast<Block*>(raw);
for (size_t i = 0; i < n-1; ++i) {
blocks[i].next = &blocks[i+1];
}
blocks[n-1].next = nullptr;
return Arena { blocks, blocks, n };
}
We maintain a linked list (called freehead) to keep track of the free blocks inside of the arena. Why a linked list over an array? No real reason. We could’ve easily represented Block and Arena like below. But the linked list version will be more adaptable to future versions of the allocator, where we’ll relax the fixed-size block assumption.
struct Block {
bool free; // am I free?
std::byte data[16];
}
struct Arena {
Block* blocks;
size_t count = 0;
//...//
}
To allocate memory, all we need to do is to pop off the head of the linked list, and return the pointer to the data field. To deallocate memory, we just push the Block pointer onto the free list if it doesn’t already exist in the free list.
void* malloc(Arena& a) {
if (!a.freehead) throw std::runtime_error("OOM");
Block* b = a.freehead;
a.freehead = b->next;
return static_cast<void*>(b->data);
}
void free(Arena& a, void* p) {
if (!p) return;
Block* b = reinterpret_cast<Block*>(reinterpret_cast<std::byte*>(p) - offsetof(Block, data));
Block* cursor = a.freehead;
while (cursor) {
// b is in the free list already, so nothing to deallocate
if (cursor == b) {
return;
}
cursor = cursor->next;
}
b->next = a.freehead;
a.freehead = b;
}
Let’s return to the two questions to judge our v0 allocator.
- How do we allocate/de-allocate memory? By popping and pushing onto a linked list that keeps track of free blocks.
- What makes a good allocation algorithm? What makes a bad one? We made a metric
used memory / allocated memory. Because of the simplifying assumptions, our algorithm easily allows for 100% utilization of the allocated memory.
In the next v1 allocator, we’ll relax the fixed-size data assumption and explore the problems that arise e.g. fragmentation, variable block sizes, resizing blocks, etc.