Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the index and occurance of each item using itertools.groupby()

Here's the story I have two lists:

list_one=[1,2,9,9,9,3,4,9,9,9,9,2]
list_two=["A","B","C","D","A","E","F","G","H","Word1","Word2"]

I want to find the indicies of consecutive 9's in list_one so that I can get corresponding string from list_two, I've tried:

group_list_one= [(k, sum(1 for i in g),pdn.index(k)) for k,g in groupby(list_one)]

I was hoping to get the index of the first 9 in each tuple and then try to go from there, but that did not work..

What can I do here?? P.S.: I've looked at the documentation of itertools but it seems very vague to me.. Thanks in advance

EDIT: Expected output is (key,occurances,index_of_first_occurance) something like

[(9, 3, 2), (9, 4, 7)]
like image 946
Aous1000 Avatar asked Apr 11 '14 22:04

Aous1000


2 Answers

Okay, I have oneliner solution. It is ugly, but bear with me.

Let's consider the problem. We have a list that we want to sum up using itertools.groupby. groupby gives us a list of keys and iteration of their repetition. In this stage we can't calculate the index, but we can easily find the number of occurances.

[(key, len(list(it))) for (key, it) in itertools.groupby(list_one)]

Now, the real problem is that we want to calculate the indexes in relation to older data. In most oneliner common functions, we are only examining the current state. However, there is one function that let us take a glimpse at the past - reduce.

What reduce does, is to go over the iterator and execute a function with the last result of the function and the new item. For example reduce(lambda x,y: x*y, [2,3,4]) will calculate 2*3 = 6, and then 6*4=24 and return 24. In addition, you can choose another initial for x instead of the first item.

Let's use it here - for each item, the index will be the last index + the last number of occurences. In order to have a valid list, we'll use [(0,0,0)] as the initial value. (We get rid of it in the end).

reduce(lambda lst,item: lst + [(item[0], item[1], lst[-1][1] + lst[-1][-1])], 
       [(key, len(list(it))) for (key, it) in itertools.groupby(list_one)], 
       [(0,0,0)])[1:]

If we don't won't to add initial value, we can sum the numbers of occurrences that appeared so far.

reduce(lambda lst,item: lst + [(item[0], item[1], sum(map(lambda i: i[1], lst)))],
       [(key, len(list(it))) for (key, it) in itertools.groupby(list_one)], [])

Of course it gives us all the numbers. If we want only the 9's, we can wrap the whole thing in filter:

filter(lambda item: item[0] == 9, ... )
like image 31
tmrlvi Avatar answered Oct 22 '22 23:10

tmrlvi


Judging by your expected output, give this a try:

from itertools import groupby

list_one=[1,2,9,9,9,3,4,9,9,9,9,2]
list_two=["A","B","C","D","A","E","F","G","H","Word1","Word2"]
data = zip(list_one, list_two)
i = 0
out = []

for key, group in groupby(data, lambda x: x[0]):
        number, word = next(group)
        elems = len(list(group)) + 1
        if number == 9 and elems > 1:
            out.append((key, elems, i))
        i += elems

print out

Output:

[(9, 3, 2), (9, 4, 7)]

But if you really wanted an output like this:

[(9, 3, 'C'), (9, 4, 'G')]

then look at this snippet:

from itertools import groupby

list_one=[1,2,9,9,9,3,4,9,9,9,9,2]
list_two=["A","B","C","D","A","E","F","G","H","Word1","Word2"]
data = zip(list_one, list_two)
out = []

for key, group in groupby(data, lambda x: x[0]):
    number, word = next(group)
    elems = len(list(group)) + 1
    if number == 9 and elems > 1:
        out.append((key, elems, word))

print out
like image 67
Steinar Lima Avatar answered Oct 22 '22 23:10

Steinar Lima