Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash Set and Array List performances

I have implemented a method which simply loops around a set of CSV files that contain data on a number of different module. This then adds the 'moduleName' into a hashSet. (Code shown below)

I have used a hashSet as it guarantees no duplicates are inserted instead of an ArrayList which would have to use the contain() method and iterate through the list to check if it is already there.

I believe using the hash set has a better performance than an array list. Am I correct in stating that?

Also, can somebody explain to me:

  1. How to work the performance for each data structure if used?
  2. What is the complexity using the big-O notation?

    HashSet<String> modulesUploaded = new HashSet<String>();  for (File f: marksheetFiles){     try {         csvFileReader = new CSVFileReader(f);         csvReader = csvFileReader.readFile();         csvReader.readHeaders();          while(csvReader.readRecord()){             String moduleName = csvReader.get("Module");              if (!moduleName.isEmpty()){                 modulesUploaded.add(moduleName);             }         }      } catch (IOException e) {         e.printStackTrace();     }      csvReader.close(); } return modulesUploaded;  

    }

like image 862
user1339335 Avatar asked Apr 17 '12 17:04

user1339335


People also ask

Is ArrayList faster than HashSet?

Both Vector and HashSet Collection implementation classes performed poorly compared to the ArrayList Collection implementation class. Vector scored 68 TPS on average, while HashSet scored 9200 TPS on average. On the other hand the ArrayList outperformed Vector and HashSet by far, resulting in 421000 TPS on average.

What is faster list or HashSet?

HashSet becomes faster for 10% only if we List is without specified capacity and checks each value before adding through whole list. If items count reduced to 4 then List again wins even in worst scenario (with 10% difference).

Should I use HashSet or ArrayList?

ArrayList maintains the insertion order i.e order of the object in which they are inserted. HashSet is an unordered collection and doesn't maintain any order. ArrayList allows duplicate values in its collection. On other hand duplicate elements are not allowed in Hashset.

Which is faster Set or list in Java?

Sets are faster than Lists if you have a large data set, while the inverse is true for smaller data sets.


1 Answers

My experiment shows that HashSet is faster than an ArrayList starting at collections of 3 elements inclusively.

A complete results table

| Boost  |  Collection Size  | |  2x    |       3 elements  | |  3x    |      10 elements  | |  6x    |      50 elements  | |  12x   |     200 elements  |  <= proportion 532-12 vs 10.000-200 elements |  532x  |  10.000 elements  |  <= shows linear lookup growth for the ArrayList 
like image 120
Andrey Chaschev Avatar answered Sep 27 '22 16:09

Andrey Chaschev