Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

add columns to a data frame calculated by for loops in python

import re
#Creating several new colums with a for loop and adding them to the original df.
#Creating permutations for a second level of binary variables for df
for i in list_ib:
    for j in list_ib:
        if i == j:
            break
        else:            
            bina = df[i]*df[j]
            print(i,j)

i are binary columns that belong to a data frame (df) and j are the same columns. I have calculated the multiplications each column with each column. My question is now, how do I add all the new binary product columns to the original df?

I have tried:

df = df + df[i,j,bina]

but I am not getting the results I need. Any suggestions?

like image 705
anitasp Avatar asked Apr 24 '16 23:04

anitasp


2 Answers

As I understand, i,j,bina are not part of your df. Build arrays for each one of those, each array element representing a 'row' and once you have all rows for i,j,bina ready, then you can concatenate like this:

>>> new_df = pd.DataFrame(data={'i':i, 'j':j, 'bina':bina}, columns=['i','j','bina'])
>>> pd.concat([df, new_df], axis=1)

Alternatively, once you have all data for 'i', 'j' and 'bina' collected and assuming you have the data for each of these in a separate array, you can do this:

>>> df['i'] = i
>>> df['j'] = j
>>> df['bina'] = bina

This will work only if these three arrays have as many elements as rows in the DataFrame df.

I hope this helps!

like image 97
Thanos Avatar answered Nov 14 '22 05:11

Thanos


Typically you add columns to a Dataframe using its built-in __setitem__(), which you can access with []. For example:

import pandas as pd

df = pd.DataFrame()

df["one"] = 1, 1, 1
df["two"] = 2, 2, 2
df["three"] = 3, 3, 3

print df

# Output:
#    one  two  three
# 0    1    2      3
# 1    1    2      3
# 2    1    2      3

list_ib = df.columns.values

for i in list_ib:
    for j in list_ib:
        if i == j:
            break
        else:
            bina = df[i] * df[j]
            df['bina_' + str(i) + '_' + str(j)] = bina # Add new column which is the result of multiplying columns i and j together

print df

# Output:
#        one  two  three  bina_two_one  bina_three_one  bina_three_two
# 0    1    2      3             2               3               6
# 1    1    2      3             2               3               6
# 2    1    2      3             2               3               6
like image 45
Matt Messersmith Avatar answered Nov 14 '22 03:11

Matt Messersmith