Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SCALA: Which data structures are optimal in which situations when using ".contains()" or ".exists()"?

I would like to know in which situations which data structures are optimal for using "contains" or "exists" checks.

I ask because I come from a Python background and am used to using if x in something: expressions for everything. For example, which expressions are evaluated the quickest:

val m = Map(1 -> 1, 2 -> 2, 3 -> 3, 4 -> 4)
                                          //> m  : scala.collection.immutable.Map[Int,Int] = Map(1 -> 1, 2 -> 2, 3 -> 3, 4
                                          //|  -> 4)
val l = List(1,2,3,4)                     //> l  : List[Int] = List(1, 2, 3, 4)
val v = Vector(1,2,3,4)                   //> v  : scala.collection.immutable.Vector[Int] = Vector(1, 2, 3, 4)

m.exists(_._1 == 3)                       //> res0: Boolean = true
m.contains(3)                             //> res1: Boolean = true
l.exists(_ == 3)                          //> res2: Boolean = true
l.contains(3)                             //> res3: Boolean = true
v.exists(_ == 3)                          //> res4: Boolean = true
v.contains(3)                             //> res5: Boolean = true

Intuitively, I would assume that vectors should be the quickest for random checks, and lists would be quickest if one knows that the value checked is in the beginning of the list and there is a lot of data. However, a confirmation or correction would be most welcome. Also, please feel free to expand to other data structures.

Note: Please let me know if you feel this question is too vague as I'm not sure I am phrasing it correctly.

like image 214
TimY Avatar asked May 08 '13 14:05

TimY


People also ask

How do you check if an element is present in an array in Scala?

The contains() method is utilized to check whether a certain element is present in the list or not. Return Type: It returns true if the element present in the contains method as argument is also present in the stated list else it returns false.

Which of the following method make use of Scala collections?

Scala Collections - Array with Range Use of range() method to generate an array containing a sequence of increasing integers in a given range.


2 Answers

Set and Map (with a default hash table implementation) are by far the fastest at contains since they compute the hash value and jump to the right location immediately. For example, if you want to find an arbitrary string out of a list of a thousand, contains on a set is about 100x faster than contains on List or Vector or Array.

With exists, you really just care about how fast the collection is to traverse--you have to traverse everything anyway. There, List is usually the champ (unless you want to traverse an array by hand), but only Set and so on are usually particularly bad (e.g. exists on List is ~8x faster than on a Set when each have 1000 elements). The others are within about 2.5x of List (usually 1.5x, but Vector has an underlying tree structure which is not all that fast to traverse).

like image 167
Rex Kerr Avatar answered Nov 05 '22 19:11

Rex Kerr


If you want to use contains extensively, you should use a Set (or a Map).

AFAIK there is no datastructure that implements an efficient (i.e. faster than O(n)) exists since the closure you pass in may not even be related to the elements inside.

like image 31
gzm0 Avatar answered Nov 05 '22 18:11

gzm0