Validating Red/Black Trees

Red/Black Trees are ubiquitous in computer science, and anyone who has taken more than a cursory glance at this site will know that I have spent a fair bit of time studying red/black trees, their various properties, characteristics and different methods of implementing them. This last point also required that I have a way of quickly determining if a given algorithm produces a valid Red/Black tree.

In previous posts I have discussed various tree visualization algorithms, culminating in my TreeDraw project, which I use to generate the various binary search tree graphics in my posts. In this post I want to discuss the algorithms used to ensure a given tree satisfies all of the properties of a Red/Black or Left Leaning Red/Black tree, as well as gather performance statistics of various Trees. When combined with the TreeDraw algorithm, I am able to instantly gather information like the following when comparing two trees:

Red/Black Tree: 
Tree Size: 63
Left rotations: 16
Right rotations: 19
Color Flips: 35
Insertions requiring rotations(%): 30.1587
Average Rotations / Insertion: 1.42105 (Right: 0.789474 / Left: 0.631579)
Max Single Insertion rotations: 2
+ Tree is black balanced.
+ Tree is valid 2-3-4 tree
+ Tree is a valid BST
-------------------------
Left Leaning Red/Black Tree:
Tree Size: 63
Left rotations: 38
Right rotations: 28
Color Flips: 42
Insertions requiring rotations(%): 46.0317
Average Rotations / Insertion: 1.68966 (Right: 0.724138 / Left: 0.965517)
Max Single Insertion rotations: 5
+ Tree is black balanced.
+ Tree is valid 2-3 tree.
+ Tree is a valid BST

This is very useful information, especially when fine-tuning an algorithm, or comparing differing implementations. The statistics regarding the number of rotations and color flips are very straightforward, of real interest is the final three tests performed on both trees. It is these three tests which determine if the provided tree is a valid Red/Black Tree (or Left Leaning Red/Black). But what is a valid Red/Black Tree?

On page 574 of “Algorithms in C++”(3rd. ed) Sedgewick provides the definition of a A Red/Black tree as: “a binary search tree in which each node is marked to be either red or black, with the additional restriction that no two red nodes appear consecutively. A balanced red-black BST is a red-black BST in which all paths from root to leaf have the same number of black nodes.” In the 4th addition, the Left Leaning invariant is introduced with the additional rule that no right child may be a red node(All red nodes “lean left”).

Ok there’s a bit to unpack there, lets start with the simplest: Red/Black Tree’s are binary search tree’s. At the very least, a tree must be a valid binary search tree, or we don’t even to need to bother with any more complicated tests.

Is the tree a BST?

All binary search trees must posses the “binary search tree property”, that is, a binary tree where the value of a given nodes key is greater then the key of its left child, and less then the key of its right child. We can check the validity of a binary search tree by performing this check on each node visited during a pre-order traversal, exiting if we encounter a node that doesn’t satisfy the BST property.

bool isBST(node* x) {
      if (x == nullptr)
          return true;
    if (x->left && x->key < x->left->key)
          return false;
    if (x->right && x->right->key < x->key)
          return false;
      return isBST(x->left) && isBST(x->right);
}

This algorithm neatly exemplifies the paradigm we will be using for the other validation algorithms as well, mainly, if a provided node is null, we return true: an empty tree is always a valid tree. We then perform the test on the invariant we are validating, and then recur on the left and right branches of the current node.

Now that we’ve determined if the tree is a valid binary search tree, lets validate that it conforms to the red/black property laid out above.

Does the BST follow the color rules?

We have two color invariants that we need to validate. First we must validate that the tree maps to a 2-3-4 tree by checking that know two red nodes occur consecutively. We already know how to do this, as we simply need to perform the same checks used to determine when to color flip and/or rotate!

bool is234(node* x) {
      if (x == nullptr)
          return true;
      if (isRed(x->left) && isRed(x->left->left))
          return false;
      if (isRed(x->left) && isRed(x->left->right))
          return false;
      if (isRed(x->right) && isRed(x->right->right))
          return false;
      if (isRed(x->right) && isRed(x->right->left))
          return false;
      return is234(x->left) && is234(x->right);
}

The code for is23() which is used for validating Left Leaning Red/Black trees is exactly the same, with the addition of one more test to ensure that all red nodes lean left. The full code is available on my github and linked at the bottom.

Is the Red/Black Tree black-balanced?

This is actually the trickiest of the three cases to validate. We want to know if every path in the tree has the same number of black nodes. At first glance this is actually much easier than it appears, but it requires a few insights to arrive at the proper solution. Like all things, it helps to take a step back, and try to simplify things. Before we can tell if all paths have the same number of black nodes, we need to know how many black nodes one path has. We know we the path to the min node in a binary search tree is obtained by following the left pointer from the root to a null node, so counting the black nodes we encounter while traversing to the left most node gets us our baseline.

bool blackBalanced() {
int numBlack = 0;
node* x = root;
while (x->left) {
if (!isRed(x)) numBlack += 1;
x = x->left;
}
return blackBal(root, numBlack);
}

Now we can take this baseline and perform.. you guessed it, a traversal of the tree! As we traverse down a path, we subtract one from our black count when we encounter a black node. Any time we arrive at a leaf, numBlack should equal zerom, otherwise the two paths have differing numbers of black nodes and are thus unbalanced. As the stack unwinds and we backtrack up the path the black count is restored before continuing back down another path in the tree.

bool blackBal(node* x, int numblack) {
      if (x == nullptr)
            return numblack == 0;
     if (!isRed(x)) numblack--;
     return blackBal(x->left, numblack) && blackBal(x->right, numblack);
}

Well, Is it Valid?

In order for a tree to be considered a valid red black tree it needs for all three of the above methods to return as true, this is as simple as combining the three tests into one boolean expression:

bool isRedBlackTree() {
return isBST(root) && is234(root) && isBalanced();
}

Using these methods you can know play around with different balancing strategies, fine tune your own red/black algorithms, or just explore the effects of applying diferent rotations, and validate if the result is a red/black tree. Until next time, happy hacking!