I would like to know in which situations which data structures are optimal for using "contains" or "exists" checks.
I ask because I come from a Python background and am used to using if x in something:
expressions for everything. For example, which expressions are evaluated the quickest:
val m = Map(1 -> 1, 2 -> 2, 3 -> 3, 4 -> 4)
//> m : scala.collection.immutable.Map[Int,Int] = Map(1 -> 1, 2 -> 2, 3 -> 3, 4
//| -> 4)
val l = List(1,2,3,4) //> l : List[Int] = List(1, 2, 3, 4)
val v = Vector(1,2,3,4) //> v : scala.collection.immutable.Vector[Int] = Vector(1, 2, 3, 4)
m.exists(_._1 == 3) //> res0: Boolean = true
m.contains(3) //> res1: Boolean = true
l.exists(_ == 3) //> res2: Boolean = true
l.contains(3) //> res3: Boolean = true
v.exists(_ == 3) //> res4: Boolean = true
v.contains(3) //> res5: Boolean = true
Intuitively, I would assume that vectors should be the quickest for random checks, and lists would be quickest if one knows that the value checked is in the beginning of the list and there is a lot of data. However, a confirmation or correction would be most welcome. Also, please feel free to expand to other data structures.
Note: Please let me know if you feel this question is too vague as I'm not sure I am phrasing it correctly.
The contains() method is utilized to check whether a certain element is present in the list or not. Return Type: It returns true if the element present in the contains method as argument is also present in the stated list else it returns false.
Scala Collections - Array with Range Use of range() method to generate an array containing a sequence of increasing integers in a given range.
Set
and Map
(with a default hash table implementation) are by far the fastest at contains
since they compute the hash value and jump to the right location immediately. For example, if you want to find an arbitrary string out of a list of a thousand, contains
on a set is about 100x faster than contains
on List
or Vector
or Array
.
With exists
, you really just care about how fast the collection is to traverse--you have to traverse everything anyway. There, List
is usually the champ (unless you want to traverse an array by hand), but only Set
and so on are usually particularly bad (e.g. exists
on List
is ~8x faster than on a Set
when each have 1000 elements). The others are within about 2.5x of List
(usually 1.5x, but Vector
has an underlying tree structure which is not all that fast to traverse).
If you want to use contains
extensively, you should use a Set
(or a Map
).
AFAIK there is no datastructure that implements an efficient (i.e. faster than O(n)) exists
since the closure you pass in may not even be related to the elements inside.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With