Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is more efficient, Sorting and then Binary Search over a Collection or Linear Search in java

Suppose I am having a Collection of object:

List<String> myList = populateMyArrayList();
//Here I am having an ArrayList with 1000 elements

Which is the better approach:

1 : Mergesort then Binary Search

Collections.sort(myList);
int keyIndex = Collections.binarySearch(myList, key);

2 : Sequential Search

for(String s : myList){
   if(s.equals(key)){
      return s;
   }
}

Should there be a difference in searching approach based on the size of the collection to be searched? If YES then how to decide.

EDIT1: Suppose I have to search the list a couple of times, and no new elements will be added in the list.

EDIT2: I could have gone for a HashSet, but I am actually having a List<CustomObject> and I can search the List multiple times based on different attributes of CustomObject. So I can't have a overridden equals method in my CustomObject

like image 930
Zeeshan Avatar asked May 26 '14 11:05

Zeeshan


2 Answers

It depends.

  • If you are searching for only one string the linear search is better because it is in O(n)
  • If you are searching for multiple strings first sorting and then binary searching maybe better. it will be O(logn + n*logn) which is O(n*logn). So if you are checking for ca. n strings, this one is better.
  • If you only want to know if your Collection contains an element (ignoring order), you should consider using HashSet which has O(1).
  • If you need order and a fast contains method, use LinkedHashSet

P.S. premature optimization is the root of all evil.

like image 165
Absurd-Mind Avatar answered Oct 27 '22 00:10

Absurd-Mind


If you do the search just one time:

  • Complexity for sort + binary search will be O(n * log n).
  • Complexity for linear search is O(n).

If you do the search for more than one time, let's say k times:

  • Complexity for sort + binary search will be O((n + k) * log n).
  • Complexity for linear search will be O(k * n).

So if you do the search just one time you should go with linear search. If you do the search for more than one time, most probably you should sort first.

Also, maybe in this case you can consider using a hash table, which has amortized complexity of O(1) for an element search.

like image 44
Andrei Bozantan Avatar answered Oct 27 '22 00:10

Andrei Bozantan