pandas

Question

Starting from the following dataframe df:

df = pd.DataFrame({'node':[1,2,3,3,3,5,5],'lang':['it','en','ar','ar','es','uz','es']})

I'm trying to build the structure:

    node     langs   lfreq
0      1      [it]     [1]
1      2      [en]     [1]
2      3  [ar, es]  [2, 1]
3      5  [uz, es]  [1, 1]

so basically grouping the lang elements and frequency per node into a single row through lists. What I've done so far:

# Getting the unique langs / node
a = df.groupby('node')['lang'].unique().reset_index(name='langs')

# Getting the frequency of lang / node
b = df.groupby('node')['lang'].value_counts().reset_index(name='lfreq')
c = b.groupby('node')['lfreq'].unique().reset_index(name='lfreq')

and then merge on node:

d = pd.merge(a,c,on='node')

After this operations, what I obtained is:

    node     langs   lfreq
0      1      [it]     [1]
1      2      [en]     [1]
2      3  [ar, es]  [2, 1]
3      5  [uz, es]     [1]

As you may notice, the last row has only one [1] occurrence of the frequency of the two [uz, es] instead of a list of [1,1] as expected. Is there a way to perform the analysis in a more concise way obtaining the desired output?

dmb · Accepted Answer

I would use the agg function and tolist()

df = pd.DataFrame({'node':[1,2,3,3,3,5,5],'lang':['it','en','ar','ar','es','uz','es']})
# Getting the unique langs / node
a = df.groupby('node')['lang'].unique().reset_index(name='langs')

# Getting the frequency of lang / node
b = df.groupby('node')['lang'].value_counts().reset_index(name='lfreq')

replace

c = b.groupby('node')['lfreq'].unique().reset_index(name='lfreq')

with

c = b.groupby('node').agg({'lfreq': lambda x: x.tolist()}).reset_index()

d = pd.merge(a,c,on='node')

and viola:

   node     langs   lfreq
0     1      [it]     [1]
1     2      [en]     [1]
2     3  [ar, es]  [2, 1]
3     5  [uz, es]  [1, 1]

pandas - create dataframe with counts and frequency of elements

Tags:

python

Fabio Lamanna

1 Answers

dmb

Recent Activity

Donate For Us