Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How to get the length of itertools _grouper

I'm working with Python itertools and using groupby to sort a bunch of pairs by the last element. I've gotten it to sort and I can iterate through the groups just fine, but I would really love to be able to get the length of each group without having to iterate through each one, incrementing a counter.

The project is cluster some data points. I'm working with pairs of (numpy.array, int) where the numpy array is a data point and the integer is a cluster label

Here's my relevant code:

data = sorted(data, key=lambda (point, cluster):cluster) for cluster,clusterList in itertools.groupby(data, key=lambda (point, cluster):cluster):     if len(clusterList) < minLen: 

On the last line: if len(clusterList) < minLen:, I get an error that

object of type 'itertools._grouper' has no len()

I've looked up the operations available for _groupers, but can't find anything that seems to provide the length of a group.

like image 463
user1466679 Avatar asked Dec 14 '12 00:12

user1466679


People also ask

What is Itertools count?

itertools. count() makes an iterator that returns values that counts up or down infinitely. itertools.count() — Functions creating iterators for efficient looping — Python 3.9.7 documentation.

What is Islice in Python?

islice() - The islice() function allows the user to loop through an iterable with a start and stop , and returns a generator. map() - The map() function creates an iterable map object that applies a specified transformation to every element in a chosen iterable.

What is chain in Python?

chain() function It is a function that takes a series of iterables and returns one iterable. It groups all the iterables together and produces a single iterable as output.


1 Answers

Just because you call it clusterList doesn't make it a list! It's basically a lazy iterator, returning each item as it's needed. You can convert it to a list like this, though:

clusterList = list(clusterList) 

Or do that and get its length in one step:

length = len(list(clusterList)) 

If you don't want to take up the memory of making it a list, you can do this instead:

length = sum(1 for x in clusterList) 

Be aware that the original iterator will be consumed entirely by either converting it to a list or using the sum() formulation.

like image 173
kindall Avatar answered Sep 21 '22 23:09

kindall