Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to check whether a value exists more often than X in a list

I have a long list (300 000 elements) and I want to check that each element in that list exists more than 5 times. So the simplest code is

[x for x in x_list if x_list.count(x) > 5]

However, I do not need to count how often x appears in the list, I can stop the counting after reaching at least 5 elements? I also do not need to go through all elements in x_list, since there is a chance that I checked value x already earlier when going through the list. Any idea how to get an optimal version for this code? My output should be a list, with the same order if possible...

like image 966
carl Avatar asked Mar 18 '17 02:03

carl


People also ask

How do you find out how many times a value appears in a list?

Use the COUNTIF function to count how many times a particular value appears in a range of cells.

How do you check if something appears more than once in a list Python?

We used the list. count() method to check if a value exists multiple times in a list. The list. count() method takes a value and returns the number of times the provided value appears in the list.

How do you count how many times a value appears in a list Python?

The easiest way to count the number of occurrences in a Python list of a given item is to use the Python . count() method. The method is applied to a given list and takes a single argument. The argument passed into the method is counted and the number of occurrences of that item in the list is returned.

How do you check if a value exists in a list?

We can use the in-built python List method, count(), to check if the passed element exists in the List. If the passed element exists in the List, the count() method will show the number of times it occurs in the entire list. If it is a non-zero positive number, it means an element exists in the List.


2 Answers

Here is the Counter-based solution:

from collections import Counter

items = [2,3,4,1,2,3,4,1,2,1,3,4,4,1,2,4,3,1,4,3,4,1,2,1]
counts = Counter(items)
print(all(c >= 5 for c in counts.values())) #prints True

If I use

items = [random.randint(1,1000) for i in range(300000)]

The counter-based solution is still a fraction of a second.

like image 189
John Coleman Avatar answered Oct 10 '22 16:10

John Coleman


Believe it or not, just doing a regular loop is much more efficient:

Data is generated via:

import random
N = 300000
arr = [random.random() for i in range(N)]
#and random ints are generated: arr = [random.randint(1,1000) for i in range(N)]

A regular loop computes in 0.22 seconds and if I use ints then it is .12 (very comparable to that of collections) (on a 2.4 Ghz processor).

di = {}
for item in arr:
    if item in di:
        di[item] += 1
    else:
        di[item] = 1
print (min(di.values()) > 5)

Your version greater than 30 seconds with or without integers.

[x for x in arr if arr.count(x) > 5]

And using collections takes about .33 seconds and .11 if I use integers.

from collections import Counter

counts = Counter(arr)
print(all(c >= 5 for c in counts.values()))

Finally, this takes greater than 30 seconds with or without integers:

count = [0]*(max(x_list)+1)
for x in x_list:
    count[x]+=1;
return [index for index, value in enumerate(count) if value >= 5]
like image 9
Neil Avatar answered Oct 10 '22 16:10

Neil