I have a list of tuples as shown below. I have to count how many items have a number greater than 1. The code that I have written so far is very slow. Even if there are around 10K tuples, if you see below example string appears two times, so i have to get such kind of strings. My question is what is the best way to achieve the count of strings here by iterating over the generator
List:
b_data=[('example',123),('example-one',456),('example',987),.....]
My code so far:
blockslst=[]
for line in b_data:
blockslst.append(line[0])
blocklstgtone=[]
for item in blockslst:
if(blockslst.count(item)>1):
blocklstgtone.append(item)
Python Tuple count() Method. Python count() method counts the occurrence of an element in the tuple. It returns the occurrence of the the element passed during call. It required a parameter which is to be counted.
Example 1: Python Tuple count() In the above example, we have used the count() method to count the number of times the elements 1 and 7 appear in the tuple. Here, the tuple numbers tuple (1,3,4,1,6,1) contains three 1's and doesn't contain the number 7. Hence, its count in the tuple is 3 and 0 respectively.
We can use the counter() method from the collections module to count the frequency of elements in a list. The counter() method takes an iterable object as an input argument. It returns a Counter object which stores the frequency of all the elements in the form of key-value pairs.
Python Tuple count() MethodThe count() method returns the number of times a specified value appears in the tuple.
You've got the right idea extracting the first item from each tuple. You can make your code more concise using a list/generator comprehension, as I show you below.
From that point on, the most idiomatic manner to find frequency counts of elements is using a collections.Counter object.
Counter
example
from collections import Counter
counts = Counter(x[0] for x in b_data)
print(counts['example'])
Sure, you can use list.count if it’s only one item you want to find frequency counts for, but in the general case, a Counter is the way to go.
The advantage of a Counter is it performs frequency counts of all elements (not just example) in linear (O(N)) time. Say you also wanted to query the count of another element, say foo. That would be done with -
print(counts['foo'])
If 'foo' doesn’t exist in the list, 0 is returned.
If you want to find the most common elements, call counts.most_common -
print(counts.most_common(n))
Where n is the number of elements you want to display. If you want to see everything, don't pass n.
To retrieve counts of most common elements, one efficient way to do this is to query most_common and then extract all elements with counts over 1, efficiently with itertools.
from itertools import takewhile
l = [1, 1, 2, 2, 3, 3, 1, 1, 5, 4, 6, 7, 7, 8, 3, 3, 2, 1]
c = Counter(l)
list(takewhile(lambda x: x[-1] > 1, c.most_common()))
[(1, 5), (3, 4), (2, 3), (7, 2)]
(OP edit) Alternatively, use a list comprehension to get a list of items having count > 1 -
[item[0] for item in counts.most_common() if item[-1] > 1]
Keep in mind that this isn’t as efficient as the itertools.takewhile solution. For example, if you have one item with count > 1, and a million items with count equal to 1, you’d end up iterating over the list a million and one times, when you don’t have to (because most_common returns frequency counts in descending order). With takewhile that isn’t the case, because you stop iterating as soon as the condition of count > 1 becomes false.
First method :
What about without loop ?
print(list(map(lambda x:x[0],b_data)).count('example'))
output:
2
Second method :
You can calculate using simple dict , without importing any external module or without making it so complex:
b_data = [('example', 123), ('example-one', 456), ('example', 987)]
dict_1={}
for i in b_data:
if i[0] not in dict_1:
dict_1[i[0]]=1
else:
dict_1[i[0]]+=1
print(dict_1)
print(list(filter(lambda y:y!=None,(map(lambda x:(x,dict_1.get(x)) if dict_1.get(x)>1 else None,dict_1.keys())))))
output:
[('example', 2)]
Test_case :
b_data = [('example', 123), ('example-one', 456), ('example', 987),('example-one', 456),('example-one', 456),('example-two', 456),('example-two', 456),('example-two', 456),('example-two', 456)]
output:
[('example-two', 4), ('example-one', 3), ('example', 2)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With