Cache Friendly Linked Data Structures: Pool Allocators and Free Lists

As Bob Dylan once said “The times they are a-changing”, and changing they are. Modern computer architectures are making heavier use of more cores and ever increasing levels of cache. The order in which instructions are executed can no longer be reasonably speculated about due to things like instruction pipelining, multiple cores, and branch prediction. Multi level caches makes reasoning on memory access patterns much more complicated. We can’t view algorithm design through the same lens of the single -core flat memory computational model that was used from the earliest days of computing right up until sometime in the 1990s, and which most of the foundational research on data structures and algorithms was based on.

In the realm of efficient data structures, pointer based data structures, such as binary search trees and linked lists have never been known to be cache friendly data structures. The very nature of dynamic memory allocation on the heap can often lead to a high degree of memory fragmentation. That doesn’t mean we can’t do anything to help them along, and there does exist a few techniques for doing just that.

In this post I’m going to cover the concept and use of memory pools and free lists for memory allocation and management, how they can be applied to building linked lists, and then show how the same idea can be used for more complex data structures such as self balancing binary search trees. The ideas shown here were inspired by the work of Alexander A. Stepanov (of STL fame) and Justin Meiners, which I have linked at the bottom.

Memory Pools and Cache Efficiency

Part of what makes traditional linked data structures slow is the way operator new, and more generally malloc(), operate. Memory is allocated as needed, from where on the heap it is available at the time it is called for. This can lead to highly fragmented memory, with nodes of your structure spread all over the heap. Needless to say, this is a nightmare for cache efficiency, with a potential cache miss every time a node is accessed.

So what can we do to improve cache efficiency? We are going to liberate our data structures from operator new by handling allocation and freeing of memory for nodes without using malloc() by implementing our own allocator.

Secondly, we are going to utilize a buffer which will keep our nodes grouped together in memory – a memory pool. This allows us to allocate new nodes as needed – but still from an area close to the others, and then returning them to a free list for future use when we are done with them. By keeping the nodes grouped together in a buffer, they should have sequential memory addresses, or at least memory addresses close enough to not require a long jump across the address space. This increases the probability that during traversals the next node to be visited is in the same cache line as the node currently being visited, greatly speeding up execution time.

And what is a free list you might be asking? a free list is exactly what the name implies: a list made up of free nodes. By recycling objects once they are created, instead of destroying them when done with them and then allocating new memory when needed again, we reuse old nodes that have already been allocated but aren’t currently being used. We can even pre initialize the free list to keep a certain amount of nodes immediately available for use.

A Pool Allocator for Linked Lists

Anyway, enough of theory, let’s get on with the implementation. The first thing we need is the definition of the node. As I said, we aren’t going to be allocating nodes on the heap with new, this means implementing the link field is going to be tricky right? Not at all. Instead of allocating nodes on the heap, we will be creating instances to be held in a vector, this means that our “pointer” is simply an index into the vector, which is serving as our node pool. A little bit of typedef helps to solidify that abstraction.

 
#include <vector>
using namespace std;

template <typename valueType, typename indexType = std::size_t>
class ListPool {
    public:
        typedef indexType nodePtr;
        typedef valueType value_type;
    private:
        struct node_t {
            value_type value;
            nodePtr next;
            bool operator==(const node_t& o) const {
                return value == o.value && next == o.next;
            }
            bool operator!=(const node_t& o) {
                return !(*this==o);
            }
        };
        std::vector<node_t> pool;
        nodePtr freeList;

With our node type defined and a storage strategy worked out, lets build up the interface for interacting with a node, once it has been created. We are going to return everything by reference, which will allow us to use our functions on either side of the assignment operator.

        node_t& node(nodePtr x) {
            return pool[x - 1];
        }
const node_t& node(nodePtr x) const {
            return pool[x - 1];
        }
value_type& value(nodePtr x) {
            return node(x).value;
        }
        nodePtr& next(nodePtr x) {
            return node(x).next;
        }

We are also going to need a substitute for nullptr, seeing as we have now abstracted pointers – at least in this sense.

       nodePtr nil() {
return nodePtr(0);
}

We are using the first entry of the pool as a sentinel node for nil, with this we can do all the same tricks with linked structures and null pointers that you would expect, like using a null pointer for an empty list:

     bool isNil(nodePtr x) {
return x == nil();
}
bool isEmpty(nodePtr x) {
return isNil(x);
}

The freelist is initialized to a null pointer in the ListPool constructor:

 
ListPool() {
            freeList = empty();
        }

And last but certainly not least, we need an alloc() and free() method for our nodes!

        nodePtr new_node() {
            pool.push_back(node_t());
            return nodePtr(pool.size());
        }
nodePtr alloc(const value_type& val, nodePtr tail) {
            nodePtr new_list = free_list;
            if (is_empty(free_list)) {
                new_list = new_node();
            } else {
                free_list = next(free_list);
            }
            value(new_list) = val;
            next(new_list) = tail;
            return new_list;
        }
nodePtr free(nodePtr x) {
            nodePtr _next = next(x);
            next(x) = free_list;
            free_list = x;
            return _next;
        }
        bool operator==(const ListPool& o) noexcept {
            return pool == o.pool;
        }
        bool operator!=(const ListPool& o) noexcept {
            return !(*this==o);
        }
};

Because we CAN use the same pool to create multiple lists from, it makes sense to keep the list clearing method outside of the class.

template <typename value_type, typename index_type>
void free_list_x(ListPool<value_type,index_type>& pool, typename ListPool<value_type,index_type>::nodePtr x) {
    while (!pool.isEmpty(x))
        x = pool.free(x);
}

Using ListPool to Implement List<T>

Now that we have our allocation system in place, we can proceed to use it for implementing any data structure the makes use of the singly linked node type we defined. I’ll be the first to admit, the syntax is bit awkward at first, but you quickly get used to it. Wrapping the calls to the ListPool in helper methods greatly reduced this awkwardness.

template <class T>
class List {
    private:
        ListPool<T, int> pool;
        typedef typename list_pool<T, int>::nodePtr nodePtr;
        typedef typename list_pool<T, int>::value_type value_type;
        nodePtr head;
        nodePtr tail;
        int n;
        nodePtr end() {
            return pool.end();
        }
        T& value(nodePtr x) {
            return pool.value(x);
        }
        nodePtr next(nodePtr x) {
            return pool.next(x);
        }
        bool isNil(nodePtr x) {
            return pool.is_end(x);
        }

With those helpers implemented, the rest is as you would expect for implementing a linked list, albeit with a prefix object method accessor pattern.

 
    public:
        List() {
            head = nil();
            n = 0;
        }
        ~List() {
            free_list_x(pool, head);
        }
        int size() const {
            return n;
        }
        bool isEmpty() {
            return isNil(head);
        }
        void push_back(value_type val) {
            nodePtr nl = pool.alloc(val, end());
            if (isEmpty()) {
                head = nl;
                tail = nl;
            } else {
                next(tail) = nl;
                tail = nl;
            }
            n++;
        }
        void push_front(value_type val) {
            head = pool.alloc(val, head);
            if (isNil(next(head)))
                tail = head;
            n++;
        }
        T pop_front() {
            T ret = value(head);
            nodePtr tmp = head;
            head = next(head);
            pool.free(tmp);
            n--;
            return ret;
        }
        T pop_back() {
            T ret = value(tail);
            nodePtr it = head;
            if (n > 1) {
                while (next(next(it)) != end()) it = next(it);
                nodePtr tmp = next(it);
                next(it) = next(next(it));
                n--;
                pool.free(tmp);
            } else {
                nodePtr tmp = head;
                head = next(head);
                n--;
                pool.free(tmp);
            }
            return ret;
        }
        void remove(T info) {
            nodePtr it = head;
            bool found = false;
            if (n > 1) {
                while (next(next(it)) != end()) {
                    if (info == value(next(it))) {
                        found = true;
                        break;
                    }
                    it = next(it);
                }
                if (found) {
                    nodePtr tmp = next(it);
                    next(it) = next(next(it));
                    n--;
                    pool.free(tmp);
                }
            } else {
                nodePtr tmp = head;
                head = next(head);
                n--;
                pool.free(tmp);
            }
        }

We can even implement the iterator pattern for our List<T>:

      class Iterator {
            private:
                nodePtr curr;
                ListPool<T, int> pool;
            public:
                Iterator(nodePtr head, ListPool<T,int>& lp) {
                    curr = head;
                    pool = lp;
                }
                T& operator*() {
                    return pool.value(curr);
                }
                Iterator operator++() {
                    curr = pool.next(curr);
                    return *this;
                }
                Iterator operator++(int) {
                    auto it = *this;
                    ++*this;
                    return it;
                }
                bool operator==(const Iterator& o) noexcept {
                    return pool == o.pool && curr == o.curr;
                }
                bool operator!=(const Iterator& o) noexcept {
                    return !(*this==o);
                }
        };
        Iterator begin() {
            return Iterator(head, pool);
        }
        Iterator end() {
            return Iterator(pool.end(), pool);
        }
};

All right let’s take the List for a test drive, make sure it’s working as we expect…

template <class T>
void print(List<T>& list) {
    for (auto it = list.begin(); it != list.end(); ++it) {
        cout<<*it;
    }
    cout<<endl;
}


int main() {
    string sed = "a list with no malloc";
    List<char> ls;
    for (char c : sed) {
        ls.push_back(c);
    }
    print(ls);
    return 0;
}

max@MaxGorenLaptop:~/DS/Algorithms$ ./listpool
a list with no malloc
max@MaxGorenLaptop:~/DS/Algorithms$

Excellent!

Other List Base Structures

We could implement other List based structures such as a Queue or Stack in a similar fashion to List, or – we can use List<T> its self to implement them directly. Option two makes our lives very easy:

 template <class T>
class Queue {
    private:
        List<T> __list;
    public:
        Queue() {

        }
        ~Queue() {

        }
        void push(T info) {
            __list.push_back(info);
        }
        T& top() {
            return (*__list.begin());
        }
        T pop() {
            return __list.pop_front();
        }
        bool isEmpty() {
            return __list.isEmpty();
        }
        int size() {
            return __list.size();
        }
};

template <class T>
class Stack {
    private:
        List<T> __list;
    public:
        Stack() { }
        ~Stack() { }
        void push(T info) {
            __list.push_front(info);
        }
        T& top() {
            return (*__list.begin());
        }
        T pop() {
            return __list.pop_front();
        }
        bool isEmpty() {
            return __list.isEmpty();
        }
        int size() {
            return __list.size();
        }
};

We’re not restricted by this technique to only list based structures but as I mentioned above, if we want a pool of a different node type, we will have to implement it’s own pool allocator.

A Pool Allocator For Binary Trees

Having warmed up with a singly linked list, It’s time to spice things up a bit. To show how other types of linked structures can be implemented with Pool allocators, we will adapt the same pattern to implement a (Left Leaning) Red Black Tree. We proceed much the same as we did previously, however where previously we only needed accessors for value and next, more complex data structures require more accessor methods.

template <typename K, typename V, typename N = std::size_t>
class TreePool {
    public:
        typedef N nodePtr;
        typedef K key_type;
        typedef V value_type;
        typedef bool color_type;
    private:
        struct node_t {
            key_type key;
            value_type value;
            color_type color;
            nodePtr left;
            nodePtr right;
            bool operator==(const node_t& o) const {
                return key == o.key;
            }
            bool operator!=(const node_t& o) {
                return !(*this==o);
            }
        };
        std::vector<node_t> pool;
        nodePtr free_list;
    public:
        tree_pool() {
            free_list = empty();
        }
        ~tree_pool() {

        }

I’ve also added a method for initializing the free list with nodes. This may seem odd, but remember that nodes don’t get added to the free list until they have been freed. However, grabbing a node of the free list is much faster than instantiating a new node and adding it to the pool when needed.

        void initfreelist(int initsize) {
            for (int i = 0; i < initsize; i++) {
                nodePtr t = new_node();
                free(t);
            }
        }

The first adaptation we are going to make is with regards to maintaining the free list. because our node structure does not have a ‘next’ field, instead of adding an additional field, we will re-use the ‘right’ pointer for that purpose when hooking a node into the free list. This way we can leave the rest of the free list code unchanged.

        nodePtr& next(nodePtr x) {
            return node(x).right;
        }
void free(nodePtr x) {
            next(x) = free_list;
            free_list = x;
        }

As mentioned, more node fields requires more accessors, so in addition to value, we add accessors for key, color, and left and right children:

        value_type& value(nodePtr x) {
            return node(x).value;
        }
        key_type& key(nodePtr x) {
            return node(x).key;
        }
        color_type& color(nodePtr x) {
            return node(x).color;
        }
        nodePtr& left(nodePtr x) {
            return node(x).left;
        }
        nodePtr& right(nodePtr x) {
            return node(x).right;
        }

The additional fields also requires changes to alloc() in order to populate them on creation:

        nodePtr alloc(K& k, V& val, nodePtr nil_node) {
            nodePtr new_tree = free_list;
            if (is_empty(free_list)) {
                new_tree = new_node();
            } else {
                free_list = _next(free_list);
            }
            key(new_tree) = k;
            value(new_tree) = val;
            color(new_tree) = true;
            right(new_tree) = nil_node;
            left(new_tree) = nil_node;
            return new_tree;
        }

The rest of the code remains unchanged, with the exception of the clean up code for destructing the tree:

        bool is_empty(nodePtr x) {
            return x == empty();
        }
        nodePtr empty() {
            return nodePtr(0);
        }
        nodePtr nil() {
            return nodePtr(0);
        }
        bool is_nil(nodePtr x) {
            return x == nil();
        }
        node_t& node(nodePtr x) {
            return pool[x - 1];
        }
        const node_t& node(nodePtr x) const {
            return pool[x-1];
        }
        nodePtr new_node() {
            pool.push_back(node_t());
            return nodePtr(pool.size());
        }
        bool operator==(const tree_pool& o) noexcept {
            return pool == o.pool;
        }
        bool operator!=(const tree_pool& o) noexcept {
            return !(*this==o);
        }
};

template <typename K, typename V, typename N>
void free_tree(tree_pool<K,V,N>& pool, typename tree_pool<K,V,N>::nodePtr x) {
    if (!pool.is_empty(x)) {
        free_tree(pool, pool.left(x));
        free_tree(pool, pool.right(x));
        pool.free(x);
    }
}

Implementing Red Black Tree’s With TreePool

I’m not going to go in depth about the inner workings of a Red Black tree – I have several other posts which already cover different implementations. Instead I want to focus on how the tree is constructed using Pool allocators.

Like we did for Linked Lists, we will once again make use of “proxy” helper methods for interacting with nodes in the node pool. This helps tremendously with keeping the code uncluttered and easy to follow.

template <typename K, typename V>
class BST {
    private:
        typedef tree_pool<K,V,int> treepool;
        typedef typename tree_pool<K,V,int>::key_type key_type;
        typedef typename tree_pool<K,V,int>::value_type value_type;
        typedef typename tree_pool<K,V,int>::nodePtr nodePtr;
        typedef typename tree_pool<K,V,int>::color_type color_type;
        treepool pool;
        nodePtr root;
        nodePtr& Left(nodePtr x) {
            return pool.left(x);
        }
        nodePtr& Right(nodePtr x) {
            return pool.right(x);
        }
        key_type& Key(nodePtr x) {
            return pool.key(x);
        }
        value_type& Value(nodePtr x) {
            return pool.value(x);
        }
        color_type& Color(nodePtr x) {
            return pool.color(x);
        }
        bool isNil(nodePtr x) {
            return pool.is_nil(x);
        }
        bool isRed(nodePtr x) {
            return isNil(x) ? false:Color(x) == true;
        }
        V nillValue;
        int n;

Once could make the case, the using these “prefix” accessors makes the code easier to read then using normal pointer notation, take for example the code for balancing the tree with rotations:

        nodePtr rotL(nodePtr h) {
            nodePtr x = Right(h);
            Right(h) = Left(x);
            Left(x) = h;
            Color(x) = Color(h);
            Color(h) = true;
            return x;
        }
        nodePtr rotR(nodePtr h) {
            nodePtr x = Left(h);
            Left(h) = Right(x);
            Right(x) = h;
            Color(x) = Color(h);
            Color(h) = true;
            return x;
        }
        nodePtr flipColor(nodePtr x) {
            Color(x) = true;
            Color(Left(x)) = false;
            Color(Right(x)) = false;
            return x;
        }
        nodePtr bal23(nodePtr x) {
            if (isRed(Right(x)) && !isRed(Left(x))) x = rotL(x);
            if (isRed(Left(x)) && isRed(Left(Left(x)))) x = rotR(x);
            if (isRed(Right(x)) && isRed(Left(x))) x = flipColor(x);
            return x;
        }

It’s easy to see how this code be changed to use regular pointers and C++’s default allocator: you would just need to change what’s happening inside the proxy methods to manipulating pointer based nodes instead of pool based nodes. Aside from the proxy methods, the only place where the the change from pointer based nodes is obvious to the user is for the base case of put(), where we call the pool allocator instead of operator new.

       nodePtr put(nodePtr h, K key, V value) {
            if (isNil(h)) {
                n++;
                return pool.allocate(key, value, pool.nil());
            }
            if (key < Key(h)) Left(h) = put(Left(h), key, value);
            else Right(h) = put(Right(h), key, value);
            return bal23(h);
        }
        bool contains(nodePtr h, K key) {
            if (isNil(h))
                return false;
            if (key == Key(h))
                return true;
            if (key < Key(h)) return contains(Left(h), key);
            else return contains(Right(h), key);
        }
        V& get(nodePtr h, K key) {
            if (isNil(h))
                return nillValue;
            if (key == Key(h))
                return Value(h);
            if (key < Key(h)) return get(Left(h), key);
            else return get(Right(h), key);
        }
        void traverse(nodePtr h) {
            if (!isNil(h)) {
                cout<<Key(h)<<" ";
                traverse(Left(h));
                traverse(Right(h));
            }
        }
    public:
        BST(int initsize = 150) {
            pool.initfreelist(initsize);
            root = pool.nil();
            n = 0;
        }
        void insert(K key, V value) {
            root = put(root, key, value);
            Color(root) = false;
        }
        int size() {
            return n;
        }
        V& get(K key) {
            return get(root, key);
        }
        bool contains(K key) {
            return contains(root, key);
        }
        void show() {
            traverse(root);
            cout<<endl;
        }
        V& operator[](K key) {
            if (!contains(key))
                insert(key, 0);
            return get(key);
        }
};

And there you have it: Pool Allocators for fun and profit.

Until next time, Happy Hacking!

Further Reading

[1] https://www.jmeiners.com/efficient-programming-with-components/08_lisp.html

[2] https://en.wikipedia.org/wiki/Free_list

[3] https://github.com/maxgoren/DataStructures/tree/main/poolAllocators