Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging and subtracting DataFrame columns in pandas?

I have a pandas DataFrame, something like:

col1  col2 col3 col5
NaN    1    2    8
2     NaN   4    8
4     NaN   4    8

I want to do two things:

1) Merge columns 1 and 2:

newcol1 col3 col5
1       2    8
2       4    8
4       4    8

I have tried using .concat, but that just concatenates the rows. Doesn't seem like I can use standard + operators with NaN values.

2) Subtract column 5 from new column 1 and column 3, so I end up with:

newcol1    col3
-7         -6
-6         -4
-4         -4

Tried doing it this way:

dataframe[['newcol1', 'col2']] - dataframe['col5']

and

dataframe[['newcol1', 'col2']].subtract(dataframe['col5'])

but neither works.

like image 346
user1566200 Avatar asked Apr 23 '15 19:04

user1566200


People also ask

How do I subtract multiple columns in pandas?

We can create a function specifically for subtracting the columns, by taking column data as arguments and then using the apply method to apply it to all the data points throughout the column.

How do I merge two DataFrame columns in pandas?

By use + operator simply you can combine/merge two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.

How do you subtract a DataFrame from a data frame?

subtract() function is used for finding the subtraction of dataframe and other, element-wise. This function is essentially same as doing dataframe – other but with a support to substitute for missing data in one of the inputs.

How do you subtract values in pandas?

Pandas DataFrame sub() MethodThe sub() method subtracts each value in the DataFrame with a specified value. The specified value must be an object that can be subtracted from the values in the DataFrame.


2 Answers

To get the new column, you could use fillna (or combine_first):

df['newcol1'] = df.col1.fillna(df.col2)

Then for the subtraction, use sub and specify axis=0 since we want to consider the row indices when matching up labels (not the column indices as is the default):

>>> df[['newcol1', 'col3']].sub(df['col5'], axis=0)
   newcol1  col3
0       -7    -6
1       -6    -4
2       -4    -4
like image 83
Alex Riley Avatar answered Sep 21 '22 08:09

Alex Riley


Here's one approach.

You could create newcol1 by sum(axis=1)

In [256]: df['newcol1'] = df[['col1', 'col2']].sum(axis=1)

In [257]: df
Out[257]:
   col1  col2  col3  col5  newcol1
0   NaN     1     2     8        1
1     2   NaN     4     8        2
2     4   NaN     4     8        4

Then use df.sub() on axis=0

In [258]: df[['newcol1', 'col3']].sub(df['col5'], axis=0)
Out[258]:
   newcol1  col3
0       -7    -6
1       -6    -4
2       -4    -4
like image 41
Zero Avatar answered Sep 18 '22 08:09

Zero