Friday, 9 May 2014

Java Collections : Internal Working

Java Collections:

Java provides many collection classes that can be used to store data. Knowing which collection class to use for best performance and optimum result is the key. In this section we will see how these different collection classes work internally.


ArrayList
ArrayList works on the principle of creating a array and adding elements to it.
ArrayList class has a member variable elementData which is a Object array;
Object[] elementData;
When we do List l = new ArrayList(); the array elementData is initalized with a size of 10

add(E element)
When a new element is added the capacity of the array elementData is checked and if it is completely
 filled that is all element 10 are filled a new array is created with a new capacity by using Arrays.copyOf.
 If the elementData array is not exhausted the new element is added in the array.
So adding a element in a array may take more time as a completely new array needs to be created with
greater capacity and the data in the old array is transferred into the new array.
add(index i, E element)
On adding a element at a particular index in ArrayList, ArrayList checks if a element is already present at
that index. If no than the element passed in add() is added at that index, otherwise a new array is created
with the index kept vacant and the remaining element shifted to right.
For Eg:

List<Integer> l = new ArrayList<Integer>();
l.add(1);
l.add(2);
l.add(1,3);
l.add(4);

for(int i:l){
     System.out.println(i);
}


Output
1
3
2
4
Here above we are trying to add 3 and position 1, since position 1 already has value '2'. A new array is
created with value at index 1 kept vacant and the remaining elements are shifted to right. Than the element 3
 is added at index 1.

get(int index)
The element present at that index in that array is returned. This is very fast.

When to use ArrayList
When the requirment is fetch data frequently and adding data is one time activity.

When not to use ArrayList
When the list is updated frequently

To understand ArrayList in more detail follow link Custom Array List. Here a custom Arraylist is created
with basic add and get operations

Linked List
Linked List is a Doubly Linked List as desribed here Double Link List
With the Node called as Entry class having structure as
 class Entry {
 E element;
 Entry next;
 Entry previous;
 }

LinkedList class also has a instance variable of type 'Entry' called 'header'. This is the 'start' element or node of the list.

add(E element )
Every Time we call add(var);  a new instance of 'Entry' class is created and attached at the end of the list.

add(var,positon)
Inserts the specified element at the specified position in this list.
Shifts the element currently at that position (if any) and any subsequent elements to the right.

get(int index)
It iterates through the list and returns the element. This is very expensive and time consuming as opposed to ArraList.get(int index)

When to use LinkedList
When the elements are getting added and removed frequently.

When not to use LinkedList
When you want to access or fetch a element by index.

HashMap
HashMap works on the principal of hashing. It stores values in the form of key,value pair and to access a value you need to provide the key.
For efficient use of HashMap the 'key' element should implement equals() and hashcode() method. equals() method define that two objects are meaningfully equal. hashcode() helps HashMap to arrange elements separately in a bucket. So elements with same hascode are kept in the same bucket together.
So when we want to fetch a element using get(K key), HashMap first identifies the bucket in which all elements of the same hascode as the hashcode of the 'key' passed are present. Than it uses the equals() method to identify the actual object present in the bucket.

Lets see how HashMap implements this logic internally.

For fast access to a value HashMap places a element (both key and value) in a SinglyLinkedList(Bucket). All the elements that have the same hascode are placed in the same SinglyLinkedList. The number of SinglyLinkedList(buckets) depends upon how many objects are present with different hashcode. To hold these buckets together a array is used. The size of the array is initially defined to 12. And it changes as new elements with different hascodes are added. Lets see the pictorial view.
The structure of the 'Entry' class used above.

class Entry {
   K key;
   V value;
   Entry next;
   int hash;
}

HashMap also has some more variables which define the initial size of the array.

DEFAULT_LOAD_FACTOR = 0.75f;
DEFAULT_INITIAL_CAPACITY = 16;

Entry[] table = new Entry[DEFAULT_INITIAL_CAPACITY];


Hashset
Hashset is a special case
HashSet internally uses HashMap. Yes thats true.
HashSet has a instance variable called 'map' which is a instance of HashMap.

add(E element)
When we add a value in Hashset, Hashset internally adds a value in 'map' by calling put(E,o);
where E that is the key is the element passed in add(E element) of HashSet and 'o' as the value which is a dummy Object creted by doing Object o = new Object; which is common for all key's entered in HashMap 'map'.
HashMap internally checks wether the Key that is 'element' is already present by calling the equals method of 'element'.
This method returns false if the Key is already present in HashMap.

TreeMap
TreeMap is a structure which is designed to work as a Red - Black - Tree. Here each node has only two child nodes and the insertion is a tree happens same as the insertion strategy of Binary Search Tree explained here. So the elements in a TreeMap are always sorted.
The elements added is a TreeMap should implement comparable and provide implementation of compareTo method. On the basis of this TreeMap decides wether the node is smaller or greater than other node. If Comparable is not implemented, a class which is Comparator should be passed in the constructor of TreeMap. If both Comaparable and Comparator are present TreeMap uses Comparator implementation.
TreeMap internally mantains a List of Nodes with each node being a Entry class which is actually a implementation of Map.Entry interface.
The basic structure of this Entry class is 
class Entry{
 K key;
 V value;
 Entry left = null;
 Entry right = null;
 Entry parent;
 boolean color = BLACK;
}
Entry of TreeMap has inetrnally three entries of Entry class : left,right,parent;
When we use put(K,V) method it checks if root is pointing anywhere if no it makes the instance of Entry and point to root;
The constructor of Entry takes key, value and parent. In this case parent is null;
For the next time we enter something using put(K,V) it first identifies the comparison machenism to use. First it check the Comparator class is present or not . This class is passed when creating the instance of TreeMap. If not present it uses the Key's Comparable implementation.
It then traverse through root and compares each node with the node entered and depending upon the comparison places the node either left or right of the parent node.

No comments:

Post a Comment