I have a long list (300 000 elements) and I want to check that each element in that list exists more than 5 times. So the simplest code is
[x for x in x_list if x_list.count(x) > 5]
However, I do not need to count how often x appears in the list, I can stop the counting after reaching at least 5 elements? I also do not need to go through all elements in x_list, since there is a chance that I checked value x already earlier when going through the list. Any idea how to get an optimal version for this code? My output should be a list, with the same order if possible...
Use the COUNTIF function to count how many times a particular value appears in a range of cells.
We used the list. count() method to check if a value exists multiple times in a list. The list. count() method takes a value and returns the number of times the provided value appears in the list.
The easiest way to count the number of occurrences in a Python list of a given item is to use the Python . count() method. The method is applied to a given list and takes a single argument. The argument passed into the method is counted and the number of occurrences of that item in the list is returned.
We can use the in-built python List method, count(), to check if the passed element exists in the List. If the passed element exists in the List, the count() method will show the number of times it occurs in the entire list. If it is a non-zero positive number, it means an element exists in the List.
Here is the Counter
-based solution:
from collections import Counter
items = [2,3,4,1,2,3,4,1,2,1,3,4,4,1,2,4,3,1,4,3,4,1,2,1]
counts = Counter(items)
print(all(c >= 5 for c in counts.values())) #prints True
If I use
items = [random.randint(1,1000) for i in range(300000)]
The counter-based solution is still a fraction of a second.
Believe it or not, just doing a regular loop is much more efficient:
Data is generated via:
import random
N = 300000
arr = [random.random() for i in range(N)]
#and random ints are generated: arr = [random.randint(1,1000) for i in range(N)]
A regular loop computes in 0.22 seconds and if I use ints then it is .12 (very comparable to that of collections) (on a 2.4 Ghz processor).
di = {}
for item in arr:
if item in di:
di[item] += 1
else:
di[item] = 1
print (min(di.values()) > 5)
Your version greater than 30 seconds with or without integers.
[x for x in arr if arr.count(x) > 5]
And using collections takes about .33 seconds and .11 if I use integers.
from collections import Counter
counts = Counter(arr)
print(all(c >= 5 for c in counts.values()))
Finally, this takes greater than 30 seconds with or without integers:
count = [0]*(max(x_list)+1)
for x in x_list:
count[x]+=1;
return [index for index, value in enumerate(count) if value >= 5]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With