Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to simplify merging dataframes process?

I have several dataframes with columns: coupon_id and rating. I want to merge this dataframes and get one dataframe with all coupon_id and rating as the sum of all ratings for this coupon_id in all dataframes.

For example. Suppose I have 2 dataframes:

| coupon_id  | rating      |
|:-----------|------------:|
| 1          |          40 |     
| 2          |          60 |    
| 3          |          50 |     
| coupon_id  | rating      |
|:-----------|------------:|
| 4          |          70 |     
| 2          |          80 |    
| 3          |          60 |     

As result I want to get this dataframe:

| coupon_id  | rating      |
|:-----------|------------:|
| 1          |          40 |     
| 2          |         140 |    
| 3          |         110 |
| 4          |          70 |     

For this problem I use this code, it works, but it is inefficient

similar_users_ratings = pd.DataFrame(columns=['coupon_id', 'rating'])

    for similarUser in most_similar_users:
        similar_user_ratings = self.ratingData.loc[self.ratingData['patient_id'] == similarUser[0], :].copy()

        similar_user_ratings.loc[:, 'rating'] = similar_user_ratings.loc[:, 'rating'].apply(lambda x: int(x) * similarUser[1])
        del similar_user_ratings['patient_id']
        similar_users_ratings = similar_users_ratings.merge(similar_user_ratings, on='coupon_id', how='outer')
        similar_users_ratings['rating_y'].fillna(.0, inplace=True)
        similar_users_ratings['rating_x'].fillna(.0, inplace=True)
        similar_users_ratings['rating'] = similar_users_ratings['rating_x'] + similar_users_ratings['rating_y']
        del similar_users_ratings['rating_y']
        del similar_users_ratings['rating_x']

How can I simplify this piece of code? Thanks.


In fact I have several dataframes such as:

      coupon_id  rating
69           12       1

      coupon_id  rating
101          37       1

      coupon_id  rating
428          11       1

      coupon_id  rating
1133         11       1

Desired dataset:

 coupon_id   rating
     12        1
     37        1
     11        2
like image 393
Vlad Shut Avatar asked Apr 01 '26 00:04

Vlad Shut


1 Answers

UPDATE:

In [46]: d1
Out[46]:
    coupon_id  rating
69         12       1

In [47]: d2
Out[47]:
     coupon_id  rating
101         37       1

In [48]: d3
Out[48]:
     coupon_id  rating
428         11       1

In [49]: d4
Out[49]:
      coupon_id  rating
1133         11       1

In [50]: pd.concat([d1,d2,d3,d4],ignore_index=True).groupby('coupon_id', as_index=False)['rating'].sum(
Out[50]:
   coupon_id  rating
0         11       2
1         12       1
2         37       1

OLD answer:

In [219]: d1.set_index('coupon_id').add(d2.set_index('coupon_id'), fill_value=0) \
            .reset_index()
Out[219]:
   coupon_id  rating
0          1    40.0
1          2   140.0
2          3   110.0
3          4    70.0
like image 124
MaxU - stop WAR against UA Avatar answered Apr 02 '26 12:04

MaxU - stop WAR against UA



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!