An Iterable Hashtable

mgcadmin 30-08-2023

The iterator pattern as it is described in the GoF book[1] bears very little resemblance to what those of us familiar with the C++ standard library know as iterators. The example described in Design Patterns[1] being closer in spirit to how it is implemented in the JDK, the underlying concept is the same. Unfortunately, such a heady level of abstraction can be difficult for a lot of beginners may who may struggle when first being exposed to the STL way of integrating algorithms with containers via iterators. A lot of their utility and power gets lost in the initial confusion of trying to figure out what they heck they even are (Are they pointers? Well, kind of... but also no... 'handle' is more appropriate, but also equally vague.) and why they are even needed.

Unless you've implemented some container libraries of your own, you may not understand just how graceful the design employed by the STL really is, especially in the way it handles certain difficult-to-resolve-otherwise corner cases. As an example: what value do you return from a failed search in a key/value map? We don't return NULL in modern C++, and relying on user supplied sentinel values has a certain smell to it. Throw an exception? This isn't java, and we don't want to confine anyone to exception based method of error handling. It is the Iterator pattern that offer us a palatable solution in that they can be used to create a uniform sentinel iterator.

I've covered iterators for linked lists, tree's, and arrays in previous posts. Hash tables, while sharing many properties with arrays offer some interesting challenges of their own when it comes to implementing iterators for them. In this post I'll be walking through the implementation of an Iterator for an open addressing hash table as doing so for a table with separate chaining presents additional hurdles.

The Hash Table

I'm not going to go in depth on hash tables in this post, as that's outside the scope of what it is I want to cover, regardless we still need to discuss the properties of our particular implementation in order to devise the best path forward.

The table should not store null values. Instead, an array of map entry objects which support an isEmpty() operation should be available to test whether a table slot is in use or not.

template <typename K, typename V, class hashfn = std::hash<K>>
class hashtable {
    private:
        struct node {
            K key;
            V value;
            bool empty;
            node(K _key, V _value) {
                key = _key; value = _value;
                empty = false;
            }
            node() { empty = true; }
            node(const node& t) {
                key = t.key;
                value = t.value;
            }
            bool isEmpty() const { return empty; }
        };
        node *table;
        int maxn;
        int n;

For now, the only operation we are going to implement (aside from size() and empty()) is put(). We will combine find and get into one method that makes use of the Iterator class we will be putting together.

put() works by adding key value pairs to the table using a simple quadratic probing open addressing scheme. If it encounters the same key already in the table, it changes the associated value: The table holds unique key values, duplicate keys are not allowed.

 public:
        hashtable(int max = 1013) {
            maxn = max;
            n = 0;
            table = new node[maxn];
        }
        int size() const {
            return n;
        }
        bool empty() const {
           return n == 0;
        }
        void put(K key, V value) {
            int idx = hasher()(key) % maxn;
            int m = 1;
            while (!table[idx].isEmpty()) {
                if (table[idx].key == key) {
                    table[idx].value = value;
                    return;
                }
                idx = (idx + (m*m)) % maxn;
                cout<<"Idx: "<<idx<<endl;
                m++;
            }
            table[idx].key = key;
            table[idx].value = value;
            table[idx].empty = false;
            n++;
        }
};

Now that we have a hash table that we can load values into, Let's get started creating an Iterator so that we can inspect and retrieve those values once they've been inserted.

An Iterator for Hashtables

As I mentioned, there are different strategies for implementing the iterator pattern. But before we get into it, lets discuss what exactly an iterator is. Iterators are objects that are used to abstract the items in a collection, providing a way to iterate over the collection and access it's values without knowing the details of what data structure is being used to implement the collection.

Without iterators, we are dependent on the underlying data structure of the collection, meaning any list or array being used has to be traversed as a linked list or array. This means without iterators we are stuck with having to do things like the following to traverse a collection:

for (ListNode* it = LinkedList.head; it != nullptr; it = it->next) {
     //do what you want with list items.
}

Traversing a tree structure is even more involved. As you can see, this is less than ideal for a number of reasons. Not the least of which being that we break encapsulation. The code above requires that we expose not only the list head, but the entire node structure needs to be known to traverse the list. If we decided to use a different structure than a linked list, we need to rewrite the code for iterating over the collection. With iterators we can avoid such inconveniences.

Implementing an Iterator for our Hash table

Internally, the hash table iterator maintains a pointer into the table array, just as you would for a regular array. The difference presents its self in the constructor: we cant just start at the first array index: we must search for the first entry. This is due to the distributed nature of the entries in a hash table.

template <typename K, typename V, class hashfn = std::hash<K>>
class hashtable {
    private: 
    /* same as above */
        node *table;
        int maxn;
        int n; 
    public:
        class Iterator {
            private:
               node* __nodePtr;
            public:
              Iterator(node* __entry) : __nodePtr(__entry) { 
                    while (__nodePtr->isEmpty()) __nodePtr++;
              }

The same goes for operator++: we must skip over the empty slots and find the next occupied table position to set as current. Just like is done in the STL, the rest of the Iterator interface is implemented vis-a-vis operator overloading:

             pair<K,V> operator*() {
                    return make_pair(__nodePtr->key, __nodePtr->value);
             }
             Iterator operator++() {
                    do {
                        __nodePtr++;
                    } while (__nodePtr->isEmpty());
                    return *this;
             }
             Iterator operator++(int) {
                    Iterator tmp = *this;
                    ++*this;
                    return tmp;
             }
             bool operator==(const Iterator& o) const {
                    return __nodePtr == o.__nodePtr;
             }
             bool operator!=(const Iterator& o) const {
                    return !(*this==o);
             }
        };

Most of this is standard fair if you've read my previous posts on iterators, or are already familiar with C++ style iterators. For those readers who aren't, I would like to draw attention to the return type of operator*() - this is how std::map and std::unordered_map handle returning key/value pairs from containers, and is one of the subtle but graceful solutions employed by the Standard Template Library I was discussing in the introduction.

We can now add the begin() and end() methods to our parent class to enable external iteration.

template <typename K, typename V, class hashfn = std::hash<K>>
class hashtable {
    private: 
    /* same as above */
    public: 
   /* same as above */
   Iterator begin() {
       return Iterator(table);
   }
   Iterator end() {
       return Iterator(table+maxn);
   }
};

Take note of how we treat the table positions to initiate begin() and end(). They require using pointer math: you can't initialize an Iterator with table[0] for begin and table[maxn] as end, despite being syntactically the same.

 int main() {
    hashtable<char, int> ht;
    string str = "asentencewithalotofrepeatingletters";
    for (char c : str) {
        ht.put(c, 1);
    }
    for (auto entry : ht) {
        cout<<entry.first<<" ";
    }
    cout<<endl;
    return 0;
} 
max@MaxGorenLaptop:~/DS/Algorithms$ ./a.out
 a c e f g h i l n o p r s t w 
max@MaxGorenLaptop:~/DS/Algorithms$

Allright! we can now iterate over our collection using a for loop. Now let's implement find() so that we can do more than add items to and iterate over the collection.

A Cleaner Container Interface via Iterators

Now that our collection is iterable, we can start flushing out the feature set, using Iterators as return values when one is required. Any hash table must have the the ability to check for set membership, and in the case of a key/value store, return the value if the key being searched for is found.

Generic programming through templates in C++ is incredibly powerful, but like anything in software development, has its own idiosyncrasies that must be accounted for. Templated classes and methods are very particular about return type, as previously mentioned, this can complicate handling operations like find() where the C convention of returning NULL in certain scenarios cannot be used - nor should it be. Thankfully, Iterators offer a graceful solution, seeing as we have a natural sentinel in the form of end().

 Iterator find(K key) {
      int idx = hashfn<K>()(key) % maxn;
      int m = 1;
      while (!table[idx].isEmpty()) {
          if (table[idx].key == key)
                return Iterator(table+idx);
          idx = (idx + (m*m)) % maxn;
          m++;
     }
     return end();
}

As I mentioned above, by utilizing the same iterator idiom as the standard template library, we can now do operations like the following:

 int main() {
    hashtable<char, int> ht;
    string str = "asentencewithalotofrepeatingletters";
    for (char c : str) {
        if (ht.find(c) == ht.end()) {
            ht.put(c, 1);
        } else {
            ht.put(c, (*ht.find(c)).second + 1);
        }
    }
    for (auto entry : ht) {
        cout<<entry.first<<": "<<entry.second<<endl;
    }
    cout<<endl;
    return 0;
}

max@MaxGorenLaptop:~/DS/Algorithms$ ./a.out
 a: 3
 c: 1
 e: 7
 f: 1
 g: 1
 h: 1
 i: 2
 l: 2
 n: 3
 o: 2
 p: 1
 r: 2
 s: 2
 t: 6
 w: 1
 max@MaxGorenLaptop:~/Desktop/DS/Algorithms$

There you have it: an iterable hash table that handles search misses gracefully, all thanks to the iterator pattern. Until next time, Happy hacking!

MaxGCoding.com

For the love of programming

An Iterable Hashtable

The Hash Table

An Iterator for Hashtables

Implementing an Iterator for our Hash table

A Cleaner Container Interface via Iterators

Further Reading

Leave A Comment

Procedural Map Generation with Binary Space Partitioning

Exact Match String Searching: The Knuth-Morris-Pratt Algorithm

The visitor pattern: OOP takes on the Expression Problem

Improving mgcLisp's define syntax

Separating the Men from the Boys, Knuth Style

Reducing rotations during deletion from AVL trees

Parsing Lisp: From Data To Code

Swiz Heaps: A deterministic modification of Randomized Meldable Heaps

Let's talk Eval/Apply

BST Deletion: Removal By Merge