What's New in JAVA 8


Finally Java 8 released under oracle leadership with many features and enhancements. Though some of the popular JSR's like JSR 294 (Jigsaw) are deferred again to next release, Java 8 still be the largest ever upgrade to java after it was introduced in 1996.  Lets see one of those improvements.


 Performance improvement of java.util.HashMap under high hash-collision conditions :


Just to give a little background :

HashMap is used to store key-value pairs in no.of buckets technically 'Entry' objects .

Entry is an inner class of HashMap. Please click  here to see the source code in java 7.

Each bucket has a unique number called hash code. When you try to put a key-value pair into the map, the hashmap will look at the hash code of the key, and store the pair in the bucket of which the identifier is the hash code of the key. Let's say if hash code of two keys are same then hash-collision occurs.

Till now (prior to Java 8), hash collision is addressed by creating a linked list data structure. Here is how...

Lets see the source code of 'createEntry' method which gets called internally when you try to 'put' values into HashMap.

In Java 7:

     void  createEntry(int hash, K key, V value, int bucketIndex) {
         Entry<K,V> e = table[bucketIndex];
         table[bucketIndex] = new Entry<>(hash, key, value, e);
         size++;
     }

An index will be calculated based on hash-code of key as I said earlier and It will be passed to the above method. Don't confuse with the table field which is nothing but an array of Entry objects. After finding the index, it first check the table array to find whether that position already contains an Entry object and pass the same to create new Entry object, it could be null or existing object.

Here is the Entry inner class:


 static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;

        Entry(int h, K k, V v, Entry<K,V> n) {
             value = v;
             next = n;
             key = k;
             hash = h;
         }
         .
         .
         .
         .
         .
    }

If it finds an Entry object in the specified index, it will move the existing entry object to next and create a new entry object with new key and value. So when ever there is hash collision, new one will be placed in the first node and the rest will be moved to next node. Like this a linked list starts growing when ever there is a hash-collision on each bucket.This is the story till Java 7.

Problem with this approach:


More the linked list size grow in a single bucket, more compares (equals) will happen while getting one of the value in the linked list. In a worst case scenario,it will be O(n). Perfect way to solve this issue is with a good hash function.

But Java 8 gets us some kind of performance improvement even if you have a bad hash function. Here is how?

Many new fields got introduced in Java 8 HashMap, TREEIFY_THRESHOLD is one of those which needs to pay attention. in Java 8, When a 'put' method is called on a HashMap, it internally calls a method called 'putVal'. It normally works the same way that the previous version of Hashmap worked like creating a linked list when a hash collision occurs.

It checks for the node count on a particular bucket on which it is trying to insert, if the node count is > TREEIFY_THRESHOLD (default is 8), new method got added 'treeifyBin' which transform all the nodes into TreeNodes, each structured similarly to those in java.util.TreeMap. It recreates that single bucket (not whole hashmap) by using the TreeNodes. 'TreeNode' is an another static inner class like 'Entry' but works like a balanced tree (red–black tree). This gives a guaranteed searching in O(log n) time.

Java 8 Source - treeifyBin () :


    final void treeifyBin(Node<K,V>[] tab, int hash) {
        int n, index; Node<K,V> e;
        if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
            resize();
        else if ((e = tab[index = (n - 1) & hash]) != null) {
            TreeNode<K,V> hd = null, tl = null;
            do {
                TreeNode<K,V> p = replacementTreeNode(e, null);
                if (tl == null)
                    hd = p;
                else {
                    p.prev = tl;
                    tl.next = p;
                }
                tl = p;
            } while ((e = e.next) != null);
            if ((tab[index] = hd) != null)
                hd.treeify(tab);
        }
    }

Java 8 Source - TreeNode :   


    static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
        TreeNode<K,V> parent;  // red-black tree links
        TreeNode<K,V> left;
        TreeNode<K,V> right;
        TreeNode<K,V> prev;    // needed to unlink next upon deletion
        boolean red;
        TreeNode(int hash, K key, V val, Node<K,V> next) {
            super(hash, key, val, next);
        }

        final TreeNode<K,V> root() {
            for (TreeNode<K,V> r = this, p;;) {
                if ((p = r.parent) == null)
                    return r;
                r = p;
            }
        }
       

Now we know that the bin/bucket transforms into a balanced tree after it reaches a threshold (TREEIFY_THRESHOLD). What happens if you delete some entries from the same bucket and now the node count is less than TREEIFY_THRESHOLD value. Will it still maintain the same balanced tree? Yes, until it reaches one more threshold called 'UNTREEIFY_THRESHOLD' (default 6). If the the node count reaches UNTREEIFY_THRESHOLD value, There is one more method called 'untreeify()' which gets the same old functionality of using linked list.

While investigating all these, I found one difference between Java 7 and Java 8 implementations of HashMap (There may be many which didn't recognize). Till Java 7, When there is a hash collision, new values will be added to beginning of the list by moving the rest to next. In Java 8, new values will be added always to end of linked list as it needs to count the no.of nodes every time to treeifyBin().








Comments

  1. The new function will be more effecient but also more complex. i will need some time to dig this new features

    ReplyDelete
  2. Your topic is very nice and helpful to us … Thank you for the information you wrote.

    Learn Hadoop Training from the Industry Experts we bridge the gap between the need of the industry. Bangalore Training Academy provide the Best Hadoop Training in Bangalore with 100% Placement Assistance. Book a Free Demo Today.
    Big Data Analytics Training in Bangalore
    Tableau Training in Bangalore
    Data Science Training in Bangalore
    Workday Training in Bangalore

    ReplyDelete

Post a Comment

Popular posts from this blog

Singleton Design Pattern

Which one is Better Vector or SynchronizedList