Fastest way to check whether a value exists more often than X in a list

Tags:

I have a long list (300 000 elements) and I want to check that each element in that list exists more than 5 times. So the simplest code is

[x for x in x_list if x_list.count(x) > 5]

However, I do not need to count how often x appears in the list, I can stop the counting after reaching at least 5 elements? I also do not need to go through all elements in x_list, since there is a chance that I checked value x already earlier when going through the list. Any idea how to get an optimal version for this code? My output should be a list, with the same order if possible...

966

asked Mar 18 '17 02:03

carl

2 Answers

Here is the Counter-based solution:

from collections import Counter

items = [2,3,4,1,2,3,4,1,2,1,3,4,4,1,2,4,3,1,4,3,4,1,2,1]
counts = Counter(items)
print(all(c >= 5 for c in counts.values())) #prints True

If I use

items = [random.randint(1,1000) for i in range(300000)]

The counter-based solution is still a fraction of a second.

189

answered Oct 10 '22 16:10

John Coleman

Believe it or not, just doing a regular loop is much more efficient:

Data is generated via:

import random
N = 300000
arr = [random.random() for i in range(N)]
#and random ints are generated: arr = [random.randint(1,1000) for i in range(N)]

A regular loop computes in 0.22 seconds and if I use ints then it is .12 (very comparable to that of collections) (on a 2.4 Ghz processor).

di = {}
for item in arr:
    if item in di:
        di[item] += 1
    else:
        di[item] = 1
print (min(di.values()) > 5)

Your version greater than 30 seconds with or without integers.

[x for x in arr if arr.count(x) > 5]

And using collections takes about .33 seconds and .11 if I use integers.

from collections import Counter

counts = Counter(arr)
print(all(c >= 5 for c in counts.values()))

Finally, this takes greater than 30 seconds with or without integers:

count = [0]*(max(x_list)+1)
for x in x_list:
    count[x]+=1;
return [index for index, value in enumerate(count) if value >= 5]

answered Oct 10 '22 16:10

Neil

Related questions
                            
                                How could one disable new account creation with django-allauth, but still allow existing users to sign in? [duplicate]
                            
                                deactivate conflict in virtualenvwapper and anaconda
                            
                                How to set timeout to pyplot.show() in matplotlib?
                            
                                Multiply high order matrices with numpy
                            
                                Is it bad form to count on exceptions? [closed]
                            
                                Asynchronous computation in TensorFlow
                            
                                importing module causes TypeError: module.__init__() takes at most 2 arguments (3 given)
                            
                                What is the meaning of arr[:] in assignment in numpy?
                            
                                Taking np.average while ignoring NaN's?
                            
                                Pass JavaScript variable to Flask url_for
                            
                                Reading a list stored in a text file [duplicate]
                            
                                How to check anaconda's version on mac?
                            
                                Python 3.5, ctypes: TypeError: bytes or integer address expected instead of str instance
                            
                                ENTER key press using Selenium WebDriver with python [duplicate]
                            
                                Get constraints in matrix format from gurobipy
                            
                                Flask Response vs Flask make_response
                            
                                python - matplotlib : figsize for subplots - adding space between rows
                            
                                ImportError: cannot import name TwilioRestClient
                            
                                How to normalize the volume of an audio file in python?
                            
                                Pandas to_dict() Returning "Timestamp"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fastest way to check whether a value exists more often than X in a list

Tags:

python

arrays

count

carl

People also ask

2 Answers

John Coleman

Neil

Recent Activity

Donate For Us