I have a list of tuples as shown below. I have to count how many items have a number greater than 1. The code that I have written so far is very slow. Even if there are around 10K tuples, if you see below example string appears two times, so i have to get such kind of strings. My question is what is the best way to achieve the count of strings here by iterating over the generator
List:
b_data=[('example',123),('example-one',456),('example',987),.....]
My code so far:
blockslst=[]
for line in b_data:
blockslst.append(line[0])
blocklstgtone=[]
for item in blockslst:
if(blockslst.count(item)>1):
blocklstgtone.append(item)
Python Tuple count() Method. Python count() method counts the occurrence of an element in the tuple. It returns the occurrence of the the element passed during call. It required a parameter which is to be counted.
Example 1: Python Tuple count() In the above example, we have used the count() method to count the number of times the elements 1 and 7 appear in the tuple. Here, the tuple numbers tuple (1,3,4,1,6,1) contains three 1's and doesn't contain the number 7. Hence, its count in the tuple is 3 and 0 respectively.
We can use the counter() method from the collections module to count the frequency of elements in a list. The counter() method takes an iterable object as an input argument. It returns a Counter object which stores the frequency of all the elements in the form of key-value pairs.
Python Tuple count() MethodThe count() method returns the number of times a specified value appears in the tuple.
You've got the right idea extracting the first item from each tuple. You can make your code more concise using a list/generator comprehension, as I show you below.
From that point on, the most idiomatic manner to find frequency counts of elements is using a collections.Counter
object.
Counter
example
from collections import Counter
counts = Counter(x[0] for x in b_data)
print(counts['example'])
Sure, you can use list.count
if it’s only one item you want to find frequency counts for, but in the general case, a Counter
is the way to go.
The advantage of a Counter
is it performs frequency counts of all elements (not just example
) in linear (O(N)
) time. Say you also wanted to query the count of another element, say foo
. That would be done with -
print(counts['foo'])
If 'foo'
doesn’t exist in the list, 0
is returned.
If you want to find the most common elements, call counts.most_common
-
print(counts.most_common(n))
Where n
is the number of elements you want to display. If you want to see everything, don't pass n
.
To retrieve counts of most common elements, one efficient way to do this is to query most_common
and then extract all elements with counts over 1, efficiently with itertools
.
from itertools import takewhile
l = [1, 1, 2, 2, 3, 3, 1, 1, 5, 4, 6, 7, 7, 8, 3, 3, 2, 1]
c = Counter(l)
list(takewhile(lambda x: x[-1] > 1, c.most_common()))
[(1, 5), (3, 4), (2, 3), (7, 2)]
(OP edit) Alternatively, use a list comprehension to get a list of items having count > 1 -
[item[0] for item in counts.most_common() if item[-1] > 1]
Keep in mind that this isn’t as efficient as the itertools.takewhile
solution. For example, if you have one item with count > 1, and a million items with count equal to 1, you’d end up iterating over the list a million and one times, when you don’t have to (because most_common
returns frequency counts in descending order). With takewhile
that isn’t the case, because you stop iterating as soon as the condition of count > 1 becomes false.
First method :
What about without loop ?
print(list(map(lambda x:x[0],b_data)).count('example'))
output:
2
Second method :
You can calculate using simple dict , without importing any external module or without making it so complex:
b_data = [('example', 123), ('example-one', 456), ('example', 987)]
dict_1={}
for i in b_data:
if i[0] not in dict_1:
dict_1[i[0]]=1
else:
dict_1[i[0]]+=1
print(dict_1)
print(list(filter(lambda y:y!=None,(map(lambda x:(x,dict_1.get(x)) if dict_1.get(x)>1 else None,dict_1.keys())))))
output:
[('example', 2)]
Test_case :
b_data = [('example', 123), ('example-one', 456), ('example', 987),('example-one', 456),('example-one', 456),('example-two', 456),('example-two', 456),('example-two', 456),('example-two', 456)]
output:
[('example-two', 4), ('example-one', 3), ('example', 2)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With