This is almost the same question than here, except that I am asking about the most efficient solution for a sorted result.
I have a list (about 10 integers randomly between 0 and 12), for example:
the_list = [5, 7, 6, 5, 5, 4, 4, 7, 5, 4]
I want to create a function that returns a list of tuples (item, counts) ordered by the first element, for example
output = [(4, 3), (5, 4), (6, 1), (7, 2)]
So far I have used:
def dupli(the_list):
return [(item, the_list.count(item)) for item in sorted(set(the_list))]
But I call this function almost a millon time and I need to make it as fast as I (python) can. Therefore my question: How to make this function less time comsuming? (what about memory?)
I have played around a bit, but nothing obvious came up:
from timeit import Timer as T
number=10000
setup = "the_list=[5, 7, 6, 5, 5, 4, 4, 7, 5, 4]"
stmt = "[(item, the_list.count(item)) for item in sorted(set(the_list))]"
T(stmt=stmt, setup=setup).timeit(number=number)
Out[230]: 0.058799982070922852
stmt = "L = []; \nfor item in sorted(set(the_list)): \n L.append((item, the_list.count(item)))"
T(stmt=stmt, setup=setup).timeit(number=number)
Out[233]: 0.065041065216064453
stmt = "[(item, the_list.count(item)) for item in set(sorted(the_list))]"
T(stmt=stmt, setup=setup).timeit(number=number)
Out[236]: 0.098351955413818359
Thanks
Christophe
We can use the len( ) function to return the number of elements present in the list.
Make use of Python Counter which returns count of each element in the list. Thus, we simply find the most common element by using most_common() method.
Using the count() Function The "standard" way (no external libraries) to get the count of word occurrences in a list is by using the list object's count() function. The count() method is a built-in function that takes an element as its only argument and returns the number of times that element appears in the list.
If you want to count multiple items in a list, you can call count() in a loop. This approach, however, requires a separate pass over the list for every count() call; which can be catastrophic for performance. Use couter() method from class collections , instead.
Change where you sort for a savings of about 20%.
Change this:
def dupli(the_list):
return [(item, the_list.count(item)) for item in sorted(set(the_list))]
To this:
def dupli(the_list):
count = the_list.count # this optimization added courtesy of Sven's comment
result = [(item, count(item)) for item in set(the_list)]
result.sort()
return result
The reason this is faster is that the sorted
iterator must create a temporary list, whereas sorting the result sorts in place.
edit: Here's another approach that is 35% faster than your original:
def dupli(the_list):
counts = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
for n in the_list:
counts[n] += 1
return [(i, counts[i]) for i in (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) if counts[i]]
Note: You may want to randomize the values for the_list
. My final version of dupli
tests even faster with other random data sets (import random; the_list=[random.randint(0,12) for i in xrange(10)]
)
I would try:
from collections import defaultdict
output = defaultdict(lambda: 0)
for item in the_list: output[item] += 1
return sorted(output.items())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With