More Binary Tree Properties

Height and depth of a tree and determining if it's balanced.

keywords

programming

2017-02-16


I have been reviewing and brushing up on some basic data structures in Computer Science. To go along with these other posts, I am now writing about some more properties of binary trees: Height, depth, and balancing.

Node Height

Height of a tree is the number of edges in the longest path from the root node to a leaf node. A tree with no children would have a height of 0. A tree with one child, or a tree with just one left and one right child node would have height 1 at the root. A tree with no nodes (NULL) will have a height defined as -1.

                    F           <---- height 2 (F)
                  /  \
                 D    H         <---- height 1 (D, H)
               /  \  / \
              B   E G   J       <---- height 0 (B, E, G J)

Finding the height of a tree involves visiting every node and recursively adding 1 to the maximum value in the left and right subtrees. Here is a Python example:

def max_height(node):
    """recursively find the max height of a binary tree"""
    if not node:
        return -1  # height of a NULL node is -1
    return max(max_height(node.left),
               max_height(node.right)) + 1

Node Depth

So, that’s height of a tree. Depth of a tree is the number of edges from the root node to another node. I think of it sort of as the opposite of height. Where height is from the perspective of the top down, depth is the same, but from the bottom up.

                F           <---- depth of F: 0
              /  \
             D    H         <---- depth of D, H: 1
           /  \  / \
          B   E G   J       <---- depth of B, E, G J: 2

Tree Balanced Property

For a tree to be balanced, the left and right subtrees should never differ in height by more than one. A balanced tree will yield optimal performance, but a tree that is not balanced won’t be much better than a linked-list data structure, so that’s why it’s an important property. A tree can be checked to see if it’s balanced or not with the help of the max_height function implemented above. Here’s a Python example:

import math

def is_balanced(node):
    """determine if tree rooted at `node` is balanced or not"""
    if not node:
        return True
    left_height = max_height(node.left)
    right_height = max_height(node.right)
    return (math.abs(left_height - right_height) <= 1 and
        is_balanced(node.left) and is_balanced(node.right))

This works to determine if the tree is balanced, but has a time complexity of O(n^2), which is not very efficient. Nodes of the subtrees are visited multiple times. The algorithm can be optimized to run more efficiently by cutting out some of the calls to max_height. To fix it, I am going to make a new helper function to get the height, called get_height.

import math

def get_height(node):
    """get height of given `node` in tree, duck out early if non-balanced"""
    if not node:
        return -1  # height of NULL node is -1
    left_height = get_height(node.left)
    if left_height == -2:
        return -2
    right_height = get_height(node.right)
    if right_height == -1:
        return -2
    if math.abs(left_height - right_height) <= 1:
        return max(left_height, right_height) + 1
    return -2  # -2 is used to indicate that it's not balanced anymore

def is_balanced(node):
    """check if tree rooted at `node` is balanced"""
    return get_height(node) != -2

This algorithm runs in O(n) time because the height and balance condition are being checked at essentially the same time. get_height is now more of a helper function to is_balanced. It returns -2 to indicate that the balance condition is no longer satisfied for a given subtree, and returns early without computing every value.