I have a defaultdict:
dd = defaultdict(list,
{'Tech': ['AAPL','GOOGL'],
'Disc': ['AMZN', 'NKE'] }
and a dataframe that looks like this:
AAPL AMZN GOOGL NKE
1/1/10 100 200 500 200
1/2/10 100 200 500 200
1/310 100 200 500 200
and the output I'd like is to SUM the dataframe based on the values of the dictionary, with the keys as the columns:
TECH DISC
1/1/10 600 400
1/2/10 600 400
1/3/10 600 400
The pandas groupby documentation says it does this if you pass a dictionary but all I end up with is an empty df using this code:
df.groupby(by=dd).sum() ##returns empty df
Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.
groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names. sort : Sort group keys.
Sum all columns in a Pandas DataFrame into new column If we want to summarize all the columns, then we can simply use the DataFrame sum() method.
Create the dict
in the right way , you can using by
with axis=1
# map each company to industry
dd_rev = {w: k for k, v in dd.items() for w in v}
# {'AAPL': 'Tech', 'GOOGL': 'Tech', 'AMZN': 'Disc', 'NKE': 'Disc'}
# group along columns
df.groupby(by=dd_rev,axis=1).sum()
Out[160]:
Disc Tech
1/1/10 400 600
1/2/10 400 600
1/310 400 600
you can create a new dataframe using the defaultdict and dictionary comprehension in 1 line
pd.DataFrame({x: df[dd[x]].sum(axis=1) for x in dd})
# output:
Disc Tech
1/1/10 400 600
1/2/10 400 600
1/310 400 600
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With