top of page
Writer's pictureMichele Iarossi

Binary search trees and AVL trees

Updated: Oct 2, 2022

Recently, I dealt with the implementation of a C++ class for binary search trees and I came across the more fascinating and challenging topic of implementing AVL trees, a self-balancing binary search tree I will describe below.


Binary search tree basics


In the computer sciences, a binary search tree is a well known data structure that is designed for fast lookups, insertions and deletions of nodes. Each node stores some information in one of its data fields, usually denoted as the node key, in addition to data pointers to its left and right subtree (or child). Keys are inserted into the tree in an ordered fashion, so that the following property is always true:


Given a node x, all the keys stored in the left subtree of x are less than the key stored in x, and all the keys stored in the right subtree of x are greater than the key stored in x.


As an example, let's build a binary search tree for storing the following unsigned integers: 7, 4, 8, 3, 6, 2, 0. By applying the property above when inserting each number, your result should look the same as shown in this animation:



In addition to the key, I'm storing in each node also a pair of numbers ( h,bf ) that represent the node height and balance factor.


I define the height of a node x as follows:


It is the maximum number of nodes that needs to be traversed in order to reach the furthest leaf including the node itself as the starting point. If x has no children, then it has height 1.


Referring to the example above, the nodes 8, 6, and 0, which have no children, have heights 1. On the contrary, the height of the root node 7 is 5, because the maximum number of nodes that need to be traversed in order to reach the node 0 (the furthest leaf) including the node 7 itself is given by the following 5 nodes: 7-> 4 -> 3 -> 2 -> 0.


The height of a binary search tree provides an indication of the runtime performance of common operations like searching, insertion or deletion. For example, it is easy to see that the runtime of a tree search is upper bounded by the tree height h itself, or in big-oh notation, the runtime of a tree search is given by O(h): in the worse case, up to h nodes need to be traversed before finding the desired node with the searched key.


I define the balance factor of a node x as follows:


It is the difference between the left and right subtree heights. If x has no children, then its balance factor is 0. The height of any missing subtree shall be considered 0.


From the definition above, if x has a left subtree but no right subtree, then its balance factor is equal to the left subtree height, and if x has a right subtree but no left subtree, then its balance factor is equal to minus the right subtree height. Again, referring to our example, the leaf nodes 8, 6, and 0 have balance factors equal to 0. On the contrary, the balance factor of the root node 7 is equal to 3, and is given by the difference between the heights of node 4 (left child) and 8 (right child), i.e. 4 - 1 = 3. In this case, we have a tree unbalanced to the left, and 3 additional nodes with respect to the right subtree need to be traversed in order to get to the furthest leaf.


Hence, the balance factor gives us an indication of whether the tree from the node perspective is balanced (bf = 0), or unbalanced to the left (bf > 0) or to the right (bf < 0).


The worst-case scenario for a binary search tree corresponds to the insertion of an ordered sequence of keys: if the keys are inserted in ascending order, then the tree will be completely unbalanced to the right; if the order is descending, then the tree will be completely unbalanced to the left.


Continuing the previous example, let's build another binary search tree by inserting the same sequence of unsigned integers as before but this time in ascending order, i.e. 0, 2, 3, 4, 6, 7, 8. Your result should look the same as shown in the following animation:



Since the sequence is ordered in ascending order, you'll keep inserting new keys always to the right, and all the nodes are aligned to the right. In such cases a binary search tree degenerates into a linked list, and the runtime performance of the common operations like searching, insertion or deletion degrades to O(n), where n is the total number of nodes.


Therefore, it is clear that the more unbalanced a binary search tree becomes, the greater the risk of degrading runtime performance. Ideally, after each insertion of a new key, the binary search tree should always be rebalanced, in order to guarantee optimal runtime performance. AVL trees do exactly this.


AVL trees


An AVL tree, named after its inventors Adelson-Velsky and Landis, is a binary search tree which rebalances itself whenever the following property is violated:


For every node x the height difference between its subtrees is either 0, +1 or -1.


When for a given node this property is violated, then the tree needs to be rebalanced either to the left or to the right. This is accomplished by means of tree rotations, which can be simple rotations towards left or right, or composite ones, either first towards right and then left, or first towards left and then right.


A left rotation applied to the current node x promotes the right child of x to the role of new parent of x (the right child moves up) whereas the current node x becomes the left child of its new parent (the current node x moves down). The previous left child (subtree) of the new parent becomes the new right child of the current node x.


This is shown by the following animation, where a left rotation is applied to node 1 when its balance factor becomes less than -1, i.e. the tree is unbalanced to the right. After the rotation, node 2 is promoted up, whereas node 1 moves down to the left, rebalancing the tree.







Similarly, a right rotation applied to the current node x promotes the left child of x to the role of new parent of x (the left child moves up) whereas the current node x becomes the right child of its new parent (the current node x moves down). The previous right child (subtree) of the new parent becomes the new left child of the current node x.


This is shown by the animation here, where a right rotation is applied to node 3 when its balance factor becomes greater than 1, i.e. the tree is unbalanced to the left. After the rotation, node 2 is promoted up, whereas node 3 moves down to the right, rebalancing the tree.







A left-right rotation applied to the current node x applies first a left rotation to the left child of x, and then a right rotation on x.


This is shown by the following animation, where a left-right rotation is applied to node 3 when its balance factor becomes greater than 1, i.e. the tree is unbalanced to the left. First, a left rotation is applied to node 1, so that all the nodes are aligned now to the left, followed by a right rotation applied to node 3. After the rotation, node 2 is promoted up, whereas node 3 moves down to the right, rebalancing the tree.



Finally, a right-left rotation applied to the current node x applies first a right rotation to the right child of x, and then a left rotation on x.


This is shown by the animation here, where a left-right rotation is applied to node 1 when its balance factor becomes less than -1, i.e. the tree is unbalanced to the right. First, a right rotation is applied to node 3, so that all the nodes are aligned now to the right, followed by a left rotation applied to node 1. After the rotation, node 2 is promoted up, whereas node 1 moves down to the left, rebalancing the tree.



Having illustrated the available rotations that we can apply to unbalanced nodes, let's build the AVL tree for storing the same unsigned integers as before in the same order: 7, 4, 8, 3, 6, 2, 0. Your result should look the same as shown here:



When node 2 is inserted, the tree becomes unbalanced to the left, and node 7 needs to be rotated to the right. When compared to the unbalanced tree at the beginning obtained for the same sequence of keys, the AVL tree is perfectly balanced and has a height of 3 instead of 5.


A C++ class implementing AVL trees


Let's move on to the implementation of AVL trees in C++, and focus in particular on the balanced insertion of keys. You can download the full class implementation from GitHub.


When I have to add a new key to the AVL tree, I first perform a normal, unbalanced insert, keeping track in a vector path of the nodes that have been traversed for adding the new key, and then I rebalance the tree by checking if any of the traversed nodes stored in path needs to be rebalanced. The important point here is that during rebalancing I don't need to check every node of the tree, but only those nodes affected by the insertion of the new key, because it is their height that has changed! The code of the templated class method insert() is shown below.


// insert a new key into the tree by keeping the tree balanced
// precondition: none
// postcondition: new node with the given key is inserted
// and the tree is kept balanced, node heights correctly updated
template <class T>
void AVLTree<T>::insert(T key)
{
    std::vector<AVLNode<T>*> path;
    
    // unbalanced insert
    insertnb(key,path);
    
    // rebalance the tree by rebalancing
    // the traversed nodes during insertion
    rebalance(path);
}

The unbalanced insertion method insertnb() is rather straightforward: it implements an iterative tree traversal followed by creation and insertion of the new node into the tree. Notice how the traversed nodes are stored into the vector path passed to the method by reference as an output parameter.


// insert a new key into the tree without balancing and updating
// heights
// precondition: none
// postcondition: new node with the given key is inserted, the
// resulting tree might be unbalanced, the heights are not updated
template <class T>
void AVLTree<T>::insertnb(T key, std::vector<AVLNode<T>*>& path)
{
    if ( is_empty() )
    {
        root = new_node(key);
        return;
    }
    
    AVLNode<T>* parent = root;
    AVLNode<T>* node   = root;
    
    // tree traversal
    while (node)
    {
        parent = node;
        path.push_back(parent);
        if (key > node->key)
            node = node->right;
        else if (key < node->key)
            node = node->left;
        else
        // key already present!
        {
            path.clear();
            return;
        }
    }
    
    // insert node
    if (key > parent->key)
        parent->right = new_node(key);
    else
        parent->left = new_node(key);
}

The rebalance() method takes care of examining each traversed node and of verifying the AVL property. When the property is violated, then the node is rebalanced either to the right if the balance factor is greater than 1, or to the left if the balance factor is less than -1. Another important point here is that after the rotation I am putting back the affected nodes into the path vector, in order to have their balance factors checked again! Due to the transplant of subtrees from the new node to the old node, it could happen that the latter is still unbalanced, and needs again to be rebalanced. As a final step, regardless of the node being rotated or not, I update its height, since it has been affected by the insertion of the new key and/or by the rotation.


// rebalance the tree by rebalancing the traversed nodes during
// insertion
// precondition: none
// postcondition: tree rebalanced and heights updated
template <class T>
void AVLTree<T>::rebalance(std::vector<AVLNode<T>*>& path)
{
    AVLNode<T>* node     = nullptr;
    AVLNode<T>* new_node = nullptr;
    
    while (!path.empty())
    {
        // get a node from the traversed path
        node = path.back();
        path.pop_back();
        
        // get its previous node
        AVLNode<T>* parent = nullptr;
        if (!path.empty())
            parent = path.back();
        
        int balance = compute_node_balance(node);
        
        // check if the node is unbalanced to the left
        if (balance > 1)
        {
            // rebalance to the right
            new_node = rebalance_to_right(node,parent);
            // insert back new node and old node to be checked
            // again after rotation
            path.push_back(new_node);
            path.push_back(node);
        }
        else if (balance < -1)
        {
            // rebalance to the left
            new_node = rebalance_to_left(node,parent);
            // insert back new node and old node to be checked
            // again after rotation
            path.push_back(new_node);
            path.push_back(node);
        }
        else
            new_node = node;
        
        update_height_node(new_node);
    }
}

The implementation of the compute_node_balance() method is also straightforward and follows directly from the definition of the balance factor of a node.


// return the balance factor of a node as the difference
// between the left and right subtree heights
// precondition: valid node pointer is given
// postcondition: balance factor calculated
template <class T>
int AVLTree<T>::compute_node_balance(AVLNode<T>* node) const
{
    // no node?
    if (!node)
        return 0;
    // the node has 2 children
    else if (node->left && node->right)
        node->balance = node->left->height - node->right->height;
    // the node has only 1 child
    else if (node->left)
        node->balance = node->left->height;
    else if (node->right)
        node->balance = -node->right->height;
    // the node has no child
    else
        return node->balance = 0;
    
    return node->balance;
}

The methods rebalance_to_right() and rebalance_to_left() trigger the corresponding rotations, i.e. right or left-right in case of rebalance_to_right(), and left or right-left in case of rebalance_to_left(). When I rebalance to the right, I need to check first if all the nodes are aligned to the left or not. If they are not aligned to the left, meaning that the balance factor of the child of the node to be rotated is negative, then I need to rotate first to the left prior to rotating to the right. Similarly, when I rebalance to the left, I need to check first if all the nodes are aligned to the right or not. If they are not aligned to the right, meaning that the balance factor of the child of the node to be rotated is positive, then I need to rotate first to the right prior to rotating to the left. The next important point here is that before leaving the function I need to update the pointer of the parent node from the old node to the new node! That's why the method requires as input parameters not only the node to be rotated but also its parent node.


// rebalance to right. Perform either a rotate right or a 
// left-right rotation
// precondition: valid node and parent pointers are given
// postcondition: return the new node after the rotation
template <class T>
AVLNode<T>* AVLTree<T>::rebalance_to_right(AVLNode<T>* node, AVLNode<T>* parent)
{
    AVLNode<T>* new_node = nullptr;
    
    int balance_child = compute_node_balance(node->left);
    
    // check first if child node is unbalanced to the right
    // then a left-right rotation is needed
    if (balance_child < 0)
    {
        // perform first left rotation of child node
        node->left = rotate_left(node->left);
    }
    
    new_node = rotate_right(node);
    
    // if this is not the root
    // then the previous node needs an update
    if (parent)
        if (parent->left == node)
            parent->left  = new_node;
        else
            parent->right = new_node;
    else
        root = new_node;
    
    return new_node;
}

// rebalance to left. Perform either a rotate left or a right-left rotation
// precondition: valid node and parent pointers are given
// postcondition: return the new node after the rotation
template <class T>
AVLNode<T>* AVLTree<T>::rebalance_to_left(AVLNode<T>* node, AVLNode<T>* parent)
{
    AVLNode<T>* new_node = nullptr;
    
    int balance_child = compute_node_balance(node->right);
    
    // check first if child node is unbalanced to the left
    // then a right-left rotation is needed
    if (balance_child > 0)
    {
        // perform first right rotation of child node
        node->right = rotate_right(node->right);
    }
    
    new_node = rotate_left(node);
    
    // if this is not the root
    // then the previous node needs an update
    if (parent)
        if (parent->left == node)
            parent->left  = new_node;
        else
            parent->right = new_node;
    else
        root = new_node;
    
    return new_node;
}

The implementation of rotate_left() and rotate_right() follow from the definitions above and are straightforward.


// perform left rotation
// precondition: valid node pointer is given
// postcondition: return the new parent of the subtree where the right node of
// the input node becomes the new parent and the input node becomes its new
// left node
template <class T>
AVLNode<T>* AVLTree<T>::rotate_left(AVLNode<T>* node)
{
    AVLNode<T>* new_left   = node;
    AVLNode<T>* new_parent = node->right;
    
    // new left right takes over new parent left
    new_left->right = new_parent->left;
    
    // update new parent left with new left
    new_parent->left = new_left;
    
    // update new left height
    update_height_node(new_left);
    
    // update new parent height
    update_height_node(new_parent);
    
    return new_parent;
}

// perform right rotation
// precondition: valid node pointer is given
// postcondition: return the new parent of the subtree where the left node of
// the input node becomes the new parent and the input node becomes its new
// right node
template <class T>
AVLNode<T>* AVLTree<T>::rotate_right(AVLNode<T>* node)
{
    AVLNode<T>* new_right  = node;
    AVLNode<T>* new_parent = node->left;
    
    // new right left takes over new parent right
    new_right->left = new_parent->right;
    
    // update new parent right with new right
    new_parent->right = new_right;
    
    // update new right height
    update_height_node(new_right);
        
    // update new parent height
    update_height_node(new_parent);
    
    return new_parent;
}

Last but not least, the method update_height_node() updates the node height after a tree manipulation. It basically adds 1 to the maximum height of its children nodes. As per definition, if the node has no children, then the height is set to 1.


// update the height attribute of an AVL node
// after the node has been affected by a tree manipulation
// precondition: valid node pointer is given
// postcondition: height updated
template <class T>
void AVLTree<T>::update_height_node(AVLNode<T> *node)
{
    // the node has 2 children
    if (node->right && node->left)
        if (node->left->height > node->right->height)
            node->height = node->left->height + 1;
        else
            node->height = node->right->height + 1;
    // the node has only 1 child
    else if (node->left)
        node->height = node->left->height + 1;
    else if (node->right)
        node->height = node->right->height + 1;
    // the node has no child
    else
        node->height = 1;
}

Generating the graph from the tree


You can use the function generate_tree_graph() for generating an image of the tree and have it saved as a PNG file. You need to install the graphviz visualisation software and adapt your build environment for having the header files and libraries included during the build.

Finally, the following reference has been consulted for creating this post:

67 views0 comments

Recent Posts

See All

Comments


bottom of page