Choosing right collection

                The performance of code can be affected significantly by choosing the right collection implementation. Regarding the correct and efficient use of the Java Collection classes, a significant performance setback comes from the low utilization of implementation classes other than ArrayList and HashMap. These two implementations are used for almost everything, although different implementations would be more appropriate in some situations.

The following rules should be applied to select the appropriate collection class:

·        Use ArrayList to collect data in order of occurrence (append operation only). Adding or removing elements at the bottom of the list is very fast on ArrayLists and LinkedList, but the LinkedList puts a high load on the garbage collector. This is the most common use case for lists.

·        Use LinkedList to maintain lists whenever elements have to be inserted or removed at positions other than the the bottom of the list. Inserting or removing elements is very slow for ArrayLists because at average half of the elements in the list have to be shifted.

·        Use LinkedList to maintain lists that contain only very few elements and have a lot of instances (for example, 5000 lists with 3 elements each). An ArrayList wastes space for unused slots, a LinkedList needs more bytes per element.

·        Never use LinkedList for index-based access. Use only the implementations of List that also implement the (tag-)interface RandomAccess.

·        Use Set types like HashSet and TreeSet to maintain lists that contain no duplicates (for example, observers). Searching is very quick for sets: O(log n) instead of O(n). The Set implementation LinkedHashSet combines the preservation of the insertion order of the List type with the quick lookup of the HashSet type.

The values in the table below show the memory requirements of ArrayList and LinkedList during an append-only operation. The LinkedList is faster in filling the list but takes much more time to clean up afterwards. This is due to the complex memory-structure of a linked list (many objects, many references) and therefore a lot of work for the garbage collector.

So the real behavior depends largely on the load that is put on the system. When the CPU load is not high and garbage collections are not frequent, the garbage collection overhead of the LinkedList is no problem.

When garbage collection takes place as the CPU is busy the time, the garbage collector needs to work off the memory structure (GC Full Cycle Time in the table) will be added completely to the execution time of the application, because the program execution is interrupted until the memory is acquired by the garbage collector.

 

 

Appending 10.000.000 elements to a list.

List Class

Memory

Garbage

Execution Time[1]

GC Time2

GC Full Cycle Time3

ArrayList

45MB

89MB

620 ms

< 1ms

140 ms

LinkedList

234MB

0MB

460 ms

230 ms

1.550 ms

[1] no garbage collections in the background

2 time required to collect the objects in the list after the reference to the list has been cleared

3 time the garbage collector took to traverse the heap for releasable objects (without finding any)

 

 

Regards,

Vamsi.

 

 

Comments

Popular posts from this blog

What's New in JAVA 8

Singleton Design Pattern

Internals of JSF