Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to subtract all rows in a dataframe with a row from another dataframe?

I would like to subtract all rows in a dataframe with one row from another dataframe. (Difference from one row)

Is there an easy way to do this? Like df-df2)?

df = pd.DataFrame(abs(np.floor(np.random.rand(3, 5)*10)),
...                 columns=['a', 'b', 'c', 'd', 'e'])
df

Out[18]:
   a  b  c  d  e
0  8  9  8  6  4
1  3  0  6  4  8
2  2  5  7  5  6


df2 = pd.DataFrame(abs(np.floor(np.random.rand(1, 5)*10)),
...                 columns=['a', 'b', 'c', 'd', 'e'])
df2

   a  b  c  d  e
0  8  1  3  7  5

Here is an output that works for the first row, however I want the remaining rows to be detracted as well...

df-df2

    a   b   c   d   e
0   0   8   5  -1  -1
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
like image 244
jonas Avatar asked Feb 28 '14 11:02

jonas


People also ask

How do you subtract values from one DataFrame from another?

subtract() function is used for finding the subtraction of dataframe and other, element-wise. This function is essentially same as doing dataframe – other but with a support to substitute for missing data in one of the inputs.

How do I subtract one row from another?

Right-click the row number of row 24. Select 'Paste Special...' from the context menu. Select Subtract, then click OK.

How do you remove rows from a DataFrame that are present in another DataFrame?

To remove rows from a data frame that exists in another data frame, we can use subsetting with single square brackets. This removal will help us to find the unique rows in the data frame based on the column of another data frame.

How do you find the difference between two rows in pandas?

During data analysis, one might need to compute the difference between two rows for comparison purposes. This can be done using pandas. DataFrame. diff() function.


1 Answers

Pandas NDFrames generally try to perform operations on items with matching indices. df - df2 only performs subtraction on the first row, because the 0 indexed row is the only row with an index shared in common.

The operation you are looking for looks more like a NumPy array operation performed with "broadcasting":

In [21]: df.values-df2.values
Out[21]: 
array([[ 0,  8,  5, -1, -1],
       [-5, -1,  3, -3,  3],
       [-6,  4,  4, -2,  1]], dtype=int64)

To package the result in a DataFrame:

In [22]: pd.DataFrame(df.values-df2.values, columns=df.columns)
Out[22]: 
   a  b  c  d  e
0  0  8  5 -1 -1
1 -5 -1  3 -3  3
2 -6  4  4 -2  1
like image 164
unutbu Avatar answered Oct 11 '22 01:10

unutbu