My dataframe looks like given below
col1 col2
type1 ['A','C','B','D']
type1 ['C','A','F','E']
type1 ['F','E','G','H']
type2 ['A','E','F','G']
type2 ['A','E','J','K']
I have to find out the elements from the list in col2 that are frequently appearing to a given user input. For example, if the user input is A. Then we have to find the top 3 elements that appear with A. And this has to be calculated for each value in col1. i.e
type1 - most frequent element for A - A,C will be the output
type2 - most frequent element for A - A,E will be the output
The data posted here is sample data.
from collections import Counter
def most_freq(series, input_):
cnt = Counter()
for row in series:
if input_ in row:
for i in row:
cnt[i] += 1
return [k for (k,v) in cnt.most_common(2)]
query = 'A'
df.groupby('col1').agg({'col2': lambda x: most_freq(x, query)})
Outputs:
col2
col1
type1 [A, C]
type2 [A, E]
Explanation:
One possible way to solve this question is to use a customised aggregate
function.
It uses a Counter
to collect all counts of elements in each row which groups by col1
if user input
appears, and return its top 2 occurrences. OP can change the arg 2
in cnt.most_common(2)
to 3
if you are looking for top 3 occurrences.
I hope I understand your question right - you want top 3 items that are neighbors of A:
from collections import Counter
def fn(x):
c = Counter()
for row in x:
s = pd.Series(row)
m = s == 'A'
c.update(s[m.shift(fill_value=False) | m.shift(-1, fill_value=False)])
return c.most_common(3)
print( df.groupby('col1').col2.apply(fn) )
Prints:
col1
type1 [(C, 2), (F, 1)]
type2 [(E, 2)]
Name: col2, dtype: object
C
is 2 times neighbor of A
, F
only once in type1
E
is 2 times neighbor of A
, in type2
If you want most common, you can do in fn()
:
return list(dict(c.most_common(1)).keys())
This prints:
col1
type1 [C]
type2 [E]
Name: col2, dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With