Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the frequent elements from a list in pandas column grouped by another column [closed]

My dataframe looks like given below

col1      col2
type1     ['A','C','B','D']
type1     ['C','A','F','E']
type1     ['F','E','G','H']
type2     ['A','E','F','G']
type2     ['A','E','J','K']

I have to find out the elements from the list in col2 that are frequently appearing to a given user input. For example, if the user input is A. Then we have to find the top 3 elements that appear with A. And this has to be calculated for each value in col1. i.e

type1 - most frequent element for A - A,C will be the output
type2 - most frequent element for A - A,E will be the output

The data posted here is sample data.

like image 321
Sid Avatar asked Jan 24 '23 22:01

Sid


2 Answers

from collections import Counter

def most_freq(series, input_):
    cnt = Counter()
    for row in series:
        if input_ in row:
            for i in row:
                cnt[i] += 1
    return [k for (k,v) in cnt.most_common(2)]

query = 'A'
df.groupby('col1').agg({'col2': lambda x: most_freq(x, query)})

Outputs:

        col2
col1    
type1   [A, C]
type2   [A, E]

Explanation:

One possible way to solve this question is to use a customised aggregate function.

It uses a Counter to collect all counts of elements in each row which groups by col1 if user input appears, and return its top 2 occurrences. OP can change the arg 2 in cnt.most_common(2) to 3 if you are looking for top 3 occurrences.

like image 175
zwjjoy Avatar answered Jan 28 '23 13:01

zwjjoy


I hope I understand your question right - you want top 3 items that are neighbors of A:

from collections import Counter

def fn(x):
    c = Counter()
    for row in x:
        s = pd.Series(row)
        m = s == 'A'
        c.update(s[m.shift(fill_value=False) | m.shift(-1, fill_value=False)])
    return c.most_common(3)

print( df.groupby('col1').col2.apply(fn) )

Prints:

col1
type1    [(C, 2), (F, 1)]
type2            [(E, 2)]
Name: col2, dtype: object

C is 2 times neighbor of A, F only once in type1

E is 2 times neighbor of A, in type2


If you want most common, you can do in fn():

return list(dict(c.most_common(1)).keys())

This prints:

col1
type1    [C]
type2    [E]
Name: col2, dtype: object
like image 29
Andrej Kesely Avatar answered Jan 28 '23 13:01

Andrej Kesely