Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas groupby using dictionary values, applying sum

Tags:

python

pandas

I have a defaultdict:

dd = defaultdict(list,
        {'Tech': ['AAPL','GOOGL'],
         'Disc': ['AMZN', 'NKE']  }

and a dataframe that looks like this:

         AAPL AMZN GOOGL NKE
1/1/10   100  200  500   200
1/2/10   100  200  500   200
1/310    100  200  500   200

and the output I'd like is to SUM the dataframe based on the values of the dictionary, with the keys as the columns:

         TECH DISC 
1/1/10   600  400 
1/2/10   600  400  
1/3/10   600  400 

The pandas groupby documentation says it does this if you pass a dictionary but all I end up with is an empty df using this code:

df.groupby(by=dd).sum()   ##returns empty df
like image 540
thomas.mac Avatar asked Jun 11 '18 01:06

thomas.mac


People also ask

How do you group by and sum multiple columns in pandas?

Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.

What is possible using Groupby () method of pandas?

groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names. sort : Sort group keys.

How do I sum multiple columns in pandas DataFrame?

Sum all columns in a Pandas DataFrame into new column If we want to summarize all the columns, then we can simply use the DataFrame sum() method.


2 Answers

Create the dict in the right way , you can using by with axis=1

# map each company to industry
dd_rev = {w: k for k, v in dd.items() for w in v}
# {'AAPL': 'Tech', 'GOOGL': 'Tech', 'AMZN': 'Disc', 'NKE': 'Disc'}

# group along columns
df.groupby(by=dd_rev,axis=1).sum() 

Out[160]: 
        Disc  Tech
1/1/10   400   600
1/2/10   400   600
1/310    400   600
like image 88
BENY Avatar answered Oct 02 '22 20:10

BENY


you can create a new dataframe using the defaultdict and dictionary comprehension in 1 line

pd.DataFrame({x: df[dd[x]].sum(axis=1) for x in dd})
# output:

        Disc  Tech
1/1/10   400   600
1/2/10   400   600
1/310    400   600
like image 35
Haleemur Ali Avatar answered Oct 02 '22 22:10

Haleemur Ali