Pandas: Group by combination of two columns in Pandas 0.23.4

Question

I am fairly new to Python. I came across Pandas: Group by combination of two columns on SO. Unfortunately, the accepted answer no longer works with pandas version 0.23.4 The objective of that post is to figure out combination of group variables, and create a dictionary for values. i.e. group_by should ignore the order of grouping.

Here's the accepted answer:

import pandas as pd
from collections import Counter

d = pd.DataFrame([('a','b',1), ('a','c', 2), ('b','a',3), ('b','a',3)],
                 columns=['x', 'y', 'score'])

d[['x', 'y']] = d[['x', 'y']].apply(sorted, axis=1)
x = d.groupby(['x', 'y']).agg(Counter)
print(x)

Here, ...apply(sorted) throws the following exception:

raise ValueError('Must have equal len keys and value ' ValueError: Must have equal len keys and value when setting with an iterable

Here's my pandas version:

> pd.__version__
Out: '0.23.4'

Here's what I tried after reading https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html:

d = pd.DataFrame([('a','b',1), ('a','c', 2), ('b','a',3), ('b','a',3)],
                 columns=['x', 'y', 'score'])

d=d.sort_values(by=['x','y'],axis=1).reset_index(drop=True)
x = d.groupby(['x', 'y']).agg(Counter)
print(x)

Unfortunately, this also throws error:

1382, in _get_label_or_level_values raise KeyError(key) KeyError: 'x'

Expected output:

        score           count
x   y                     
a   b   {1: 1, 3: 2}      2
    c   {2: 1}            1

Can someone please help me? On a side note, it will be great if you could also guide on how to compute the count of keys() in score column. I am looking for a vectorized solution.

I am using python 3.6.7

Many thanks.

jezrael · Accepted Answer

Problem is sorted return lists, so is necessary convert ti to Series:

d[['x', 'y']] = d[['x', 'y']].apply(lambda x: pd.Series(sorted(x)), axis=1)

But faster is use numpy.sort with DataFrame constructor, because apply are loops under the hood:

d = pd.DataFrame([('a','b',1), ('a','c', 2), ('b','a',3), ('b','a',3)],
                 columns=['x', 'y', 'score'])

d[['x', 'y']] = pd.DataFrame(np.sort(d[['x', 'y']], axis=1), index=d.index)

Then seelct column for aggregation with list of aggregated functions - e.g. nunique for count of number of unique values:

x = d.groupby(['x', 'y'])['score'].agg([Counter, 'nunique'])
print(x)
          Counter  nunique
x y                       
a b  {1: 1, 3: 2}        2
  c        {2: 1}        1

Or count by DataFrameGroupBy.size:

x = d.groupby(['x', 'y'])['score'].agg([Counter, 'size'])
print(x)
          Counter  size
x y                    
a b  {1: 1, 3: 2}     3
  c        {2: 1}     1

Vivek Kalyanarangan · Answer

Use -

a=d[['x','y']].values
a.sort(axis=1)
d[['x','y']] = a
x = d.groupby(['x', 'y']).agg(Counter)
print(x)

Output

            score
x y              
a b  {1: 1, 3: 2}
  c        {2: 1}

Pandas: Group by combination of two columns in Pandas 0.23.4

Tags:

python

python-3.x

pandas

watchtower

2 Answers

jezrael

Vivek Kalyanarangan

Recent Activity

Donate For Us

Pandas: Group by combination of two columns in Pandas 0.23.4

Tags:

python

python-3.x

pandas

watchtower

2 Answers

jezrael

Vivek Kalyanarangan

Related questions

Recent Activity

Donate For Us