Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Dataframe creating a unique column

I have this dataframe: enter image description here

I want to add each column, as duration + credit_amount, so I have created the following algorithm:

def automate_add(add):
  for i, column in enumerate(df):
    for j, operando in enumerate(df):
        if column != operando:
            columnName = column + '_sum_' + operando
            add[columnName] = df[column] + df[operando]

with the output:

enter image description here

  1. duration_sum_credit_amount
  2. duration_sum_installment_commitment
  3. credit_amount_sum_duration
  4. credit_amount_sum_installment_commitment
  5. installment_commitment_sum_duration
  6. installment_commitment_sum_credit_amount

However, knowing that duration + credit_amount = credit_amount + duration. I wouldn't like to have repeated columns. Expecting this result from the function:

  1. duration_sum_credit_amount
  2. duration_sum_installment_commitment
  3. credit_amount_sum_installment_commitment

How can I do it?

I am trying to use hash sets but seems to work only in pandas series [1].

EDIT: Dataframe: https://www.openml.org/d/31

like image 960
Guilherme Felipe Reis Avatar asked Dec 02 '25 05:12

Guilherme Felipe Reis


1 Answers

Use the below, should work faster:

import itertools

my_list=[(pd.Series(df.loc[:,list(i)].sum(axis=1),\
name='_sum_'.join(df.loc[:,list(i)].columns))) for i in list(itertools.combinations(df.columns,2))]    
final_df=pd.concat(my_list,axis=1)
print(final_df)

  duration_sum_credit_amount  duration_sum_installment_commitment  \
0                        1175                                   10   
1                        5999                                   50   
2                        2108                                   14   
3                        7924                                   44   
4                        4894                                   27   

   credit_amount_sum_installment_commitment  
0                                      1173  
1                                      5953  
2                                      2098  
3                                      7884  
4                                      4873  

Explanation: print(list(itertools.combinations(df.columns,2))) gives:

[('duration', 'credit_amount'),
('duration', 'installment_commitment'),
 ('credit_amount', 'installment_commitment')]

Post that do :

for i in list(itertools.combinations(df.columns,2)):
    print(df.loc[:,list(i)])
    print("---------------------------")

this prints the combinations of columns together. so i just summed it on axis=1 and called it under pd.series, and gave it a name by joining them.

Post this just append them to the list and concat them on axis=1 to get the final result. :)

like image 153
anky Avatar answered Dec 03 '25 20:12

anky



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!