pandas groupby using dictionary values, applying sum

Tags:

python

pandas

I have a defaultdict:

dd = defaultdict(list,
        {'Tech': ['AAPL','GOOGL'],
         'Disc': ['AMZN', 'NKE']  }

and a dataframe that looks like this:

         AAPL AMZN GOOGL NKE
1/1/10   100  200  500   200
1/2/10   100  200  500   200
1/310    100  200  500   200

and the output I'd like is to SUM the dataframe based on the values of the dictionary, with the keys as the columns:

         TECH DISC 
1/1/10   600  400 
1/2/10   600  400  
1/3/10   600  400

The pandas groupby documentation says it does this if you pass a dictionary but all I end up with is an empty df using this code:

df.groupby(by=dd).sum()   ##returns empty df

540

asked Jun 11 '18 01:06

thomas.mac

2 Answers

Create the dict in the right way , you can using by with axis=1

# map each company to industry
dd_rev = {w: k for k, v in dd.items() for w in v}
# {'AAPL': 'Tech', 'GOOGL': 'Tech', 'AMZN': 'Disc', 'NKE': 'Disc'}

# group along columns
df.groupby(by=dd_rev,axis=1).sum() 

Out[160]: 
        Disc  Tech
1/1/10   400   600
1/2/10   400   600
1/310    400   600

answered Oct 02 '22 20:10

BENY

you can create a new dataframe using the defaultdict and dictionary comprehension in 1 line

pd.DataFrame({x: df[dd[x]].sum(axis=1) for x in dd})
# output:

        Disc  Tech
1/1/10   400   600
1/2/10   400   600
1/310    400   600

answered Oct 02 '22 22:10

Haleemur Ali

Related questions
                            
                                SQLAlchemy: Get only one column [duplicate]
                            
                                How to use regex non-capturing groups format in Python
                            
                                Python/Threading/Barrier: Is this a correct usage of Barrier?
                            
                                dragging points in matplotlib interactive plot
                            
                                URL patterns in Django 2
                            
                                Writing a 3D Numpy array to a CSV file
                            
                                memory error when using gensim for loading word2vec
                            
                                How to create an anti-diagonal identity matrix (where the diagonal is flipped left to right) in numpy
                            
                                list comprehension returning "generator object..."
                            
                                Why does `instance_of_object.foo is instance_of_object.foo` evaluate False? [duplicate]
                            
                                Select fields to return from $lookup
                            
                                pandas: how to group by multiple columns and perform different aggregations on multiple columns?
                            
                                Template matching with multiple objects in OpenCV Python
                            
                                ModuleNotFoundError in Python 3 but not 2
                            
                                How to get the selected date for DateEntry in tkcalendar (Python)?
                            
                                Pyautogui screenshot. Where does it go? How to save and find later?
                            
                                Can you get a static external IP address for Google Cloud Composer / Airflow?
                            
                                Extract features into a dataset from keras model
                            
                                How to loop large parquet file with generators in python?
                            
                                Python: Extracting XML to DataFrame (Pandas)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With