Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate differences across n columns in pandas rather than rows

I am playing around with data and need to look at differences across columns (as well as rows) in a fairly large dataframe. The easiest way for rows is clearly the diff() method, but I cannot find the equivalent for columns?

My current solution to obtain a dataframe with the columns differenced for via

df.transpose().diff().transpose()

Is there a more efficient alternative? Or is this such odd usage of pandas that this was just never requested/ considered useful? :)

Thanks,

like image 975
John Smizz Avatar asked Mar 23 '15 19:03

John Smizz


People also ask

How to find the difference between two rows in a pandas Dataframe?

You can use the DataFrame.diff () function to find the difference between two rows in a pandas DataFrame. periods: The number of previous rows for calculating the difference. axis: Find difference over rows (0) or columns (1). The following examples show how to use this function in practice.

What is diff method in pandas?

In this tutorial, you’ll learn how to use the Pandas diff method to calculate the difference between rows and between columns. You’ll learn how to use the .diff method to calculate the difference between subsequent rows or between rows of defined intervals (say, every seven rows).

How do you subtract between two rows in pandas?

Because of this, we can easily use the shift method to subtract between rows. The Pandas shift method offers a pre-step to calculating the difference between two rows by letting you see the data directly. The Pandas diff method simply calculates the difference, thereby abstracting the calculation.

How do you calculate the difference between two Dataframe elements?

Calculates the difference of a Dataframe element compared with another element in the Dataframe (default is element in previous row). Periods to shift for calculating difference, accepts negative values. Take difference over rows (0) or columns (1). First differences of the Series. Percent change over given number of periods.


1 Answers

Pandas DataFrames are excellent for manipulating table-like data whose columns have different dtypes.

If subtracting across columns and rows both make sense, then it means all the values are the same kind of quantity. That might be an indication that you should be using a NumPy array instead of a Pandas DataFrame.

In any case, you can use arr = df.values to extract a NumPy array of the underlying data from the DataFrame. If all the columns share the same dtype, then the NumPy array will have the same dtype. (When the columns have different dtypes, df.values has object dtype).

Then you can compute the differences along rows or columns using np.diff(arr, axis=...):

import numpy as np
import pandas as pd

df = pd.DataFrame(np.arange(12).reshape(3,4), columns=list('ABCD'))
#    A  B   C   D
# 0  0  1   2   3
# 1  4  5   6   7
# 2  8  9  10  11

np.diff(df.values, axis=0)    # difference of the rows
# array([[4, 4, 4, 4],
#        [4, 4, 4, 4]])

np.diff(df.values, axis=1)    # difference of the columns
# array([[1, 1, 1],
#        [1, 1, 1],
#        [1, 1, 1]])
like image 138
unutbu Avatar answered Nov 14 '22 23:11

unutbu