Applying the Iterator Pattern to In-Memory B+ Trees

Anyone who has implemented a collection has faced the dilemma of what to return to the user to indicate a search failure. Traditionally, C programmers would return either some kind of enumerated error code, though more often than not, they would return NULL. This is less than optimal, not least of all because it willingly opens the door to a whole range of null value related errors. We could return a node object, but that would break encapsulation, and comes with it’s own host of problems.

So what’s can be done? The answer – it turns out – is iterators! An iterator is an object that abstracts the concept of a “position” in a collection, and as it turns out, they offer a clean way of signifying that an item is not present in the collection: Return an iterator to a position “one past the last item in the collection.” In the C++ standard library, this is denoted by the special iterator returned by the class method end().

#include <iostream>
#include <map>
#include <vector>
using namespace std;

int main() {
    map<string, int> nameAndYear = { {"C++", 1983}, {"Java",1996}, {"Smalltalk", 1980},{"python", 1993}, {"javascript", 1996} };
    vector<string> names = {"C++", "Smalltalk", "perl", "Java", "ruby" };
    for (string lang : names)
        if (nameAndYear.find(lang) != nameAndYear.end())
             cout<<lang<<" was released in "<<nameAndYear.at(lang)<<endl;
        else cout<<"Sorry, no year known for the release of "<<lang<<endl;
    return 0;
}

Aside from providing us with a clean way to handle search misses, proper implementation of iterators also allows us to use our collection with C++’s foreach loop, as shown above looping over the names vector. With these reasons in mind, lets implement an iterator for the B+ Tree discussed in my previous post!

The Iterator Interface

If we want to be able to use our iterators with C++’s foreach loop, then our iterators must conform to the interface that C++ expects them to provide. Unlike languages such as Java, iterators are manipulated almost entirely through operator overloading. Because B+ Tree’s are ordered collections, we want our Iterator to perform an in-order traveral of the elements. In a B+ Tree’s, this means traversing the bottom level of the tree, which is conveniently arranged as a double linked list, allowing us to iterate both forwards and backwards. To take full advantage of this, our Iterator should implement the following interface:

template <class T, class T2>
class BPIterator {
private:
BPNode<T,T2>* node;
int idx;
public:
BPIterator(BPNode<T,T2> ptr, int pos);
std::pair<T,T2> operator*();
BPIterator operator++();
BPIterator operator++(int);
BPIterator operator--();
BPIterator operator--(int);
bool operator==(const BPIterator& it) const;
bool operator!=(const BPIterator& it) const;
};

In past articles I’ve covered implementing iterators for both linked and array based data structures. When I covered implementing iterators for hash tables with separate chaining, I detailed how to manage switching from an array based iterator to a linked iterator and back in order to traverse the hash table and its buckets. Similarly to that scenario, an Iterator to a B+ Tree will requires traversing a linked structure, and an array based structure as it process the leaf level of the B+ tree. While it may initially sound complicated, B+ Tree’s are actually the easiest tree to write an iterator for, looked at from a certain perspective, it is the same process as writing an iterator for an unrolled linked list.

The constructor takes a pointer to the node iteration should start on, as well as an integer for the index of the position in the nodes array to start at. These are used to initialize the private members of the iterator class.

 
class BPIterator {
private:
BPNode<T,T2>* node;
int idx;
public:
BPIterator(BPNode<T,T2> ptr, int pos) {
` node = ptr;
idx = pos;
}

We implement operator*() to return an std::pair of the key/value pair for the current position, Iterators are not kind and will not check if a pointer is null before dereferencing:

 std::pair<T,T2> operator*() {
      return std::make_pair(node->page[aptr].key, node->page[aptr].value);
}

Ok, so iterators aren’t total assholes, they’ll check for null pointers when they have to, such as when we implement pre- increment and decrement:

BPIterator operator++() {
if (node != nullptr) {
           if (aptr < node->size()-1)
              aptr++;
          else {
              node = node->next;
              aptr = 0;
          }
      }
      return *this;
}
BPIterator operator--() {
     if (node != nullptr) {
         if (aptr > 0)
            aptr--;
        else {
            node = node->prev;
            if (node)
              aptr = node->size()-1;
         }
    }
    return *this;
}

Not muich that really bears discussion. Take note of how we check if the next position is valid before take any action, as it may require either changing of a node in addition to a resetting of the array index. Post- increment and decrement operators, as well as tests for equality are all implemented in the normal way:

 
BPIterator operator++(int) {
     BPIterator it = *this;
     ++*this;
     return it;
}
BPIterator operator--(int) {
    BPIterator it = *this;
    --*this;
    return it;
}
bool operator==(const BPIterator& it) const {
     return node == it.node && aptr == it.aptr;
}
bool operator!=(const BPIterator& it) const {
     return !(*this==it);
}

And that rounds out our iterator class, nothing too fancy, certainly simpler than for a binary search tree! Now lets get it working in our B+ tree.

Adding Iterators To our Tree

It’s not just our iterators that have to conform to C++’s expectations, our Collection must posses the appropriate interface as well. In reality, all we need to do is provide an iterator to the first element in the collection, name, imaginatively, begin(), and the equally inspiringly named end(), which points to an imaginary element one past the last. We should also provide const varieties of both.

template <class T, class T2>
class BPTree {
private:
/* other tree related code */
using iterator = BPIterator<T,T2>;
using bpnode = BPNode<T,T2>;
bpnode* min(bpnode* node, int ht);
bpnode* min(bpnode* node, int ht) const;
public:
iterator begin();
iterator end();
iterator begin() const;/ /one day ill figure out
iterator end() const; //what this const correct thing is all about.
/* other tree related code */
};

In order to provide in iterator to the first element in the collection, we need to a way to get to the first element in the collection. B+ Tree’s being ordered search tree’s, means we simply follow the left most branch until we are at the leaf, and is what the min() method is for:

bpnode* min(bpnode* node, int ht) {
if (ht == 0)
return node;
return min(node->at(0).child, ht-1);
}

With that helper in place, we implement begin() to simply return our iterator object, initialized to the left most leaf node, array position zero.
Similarly, we initialize end() with a reference to nullptr, and index -1:

iterator begin() {
return iterator(min(root, height), 0);
}

iterator end() {
return iterator(nullptr, -1);
}

Implementing Search With the Iterator Pattern

We can re turn an iterator to any position in the tree, by simply instantiating an iterator object with a pointer to the node containing the element we want, as well as the index of the array position, This allows to cleanly implement search() as follows:

iterator find(K key) {
return search(root, key, height);
}

iterator search(bpnode* node, K key, int ht) {
if (ht == 0) {
int j = node->search(key);
if (j > -1) return iterator(node, j);
else return end();
}
return search(node->at(node->floor(key)).child, key, ht-1);
}

Now we can use the same familiar idiom of comparing the result of find against end() that I went over in the beginning of the post, while using our B+ tree in place of std::map. Another bonus to using iterators is the we also user opertor++ and operator– to quickly retrieve the elements near in order to the element we originally queried.

That’s All I’ve got for today, Until next time, Happy Hacking!

Further Reading

The two resources primarily referenced in the making of this post were:

[1] Gamma, Helm, Johnson, Vlissides, “Design Patterns: Elements of Reusable Object Oriented Software” – The famous “Gang Of Four” Book. Just read it.

[2] Plauger, Stepanov, “The C++ Standard Template Library” – This book is a (very) detailed look at how the standard template library was implemented, albeit early on.

[3] And for a full implementation of B+ Tree using iterators, check my github at: https://github.com/maxgoren/BPlusTree/