I am playing around with data and need to look at differences across columns (as well as rows) in a fairly large dataframe. The easiest way for rows is clearly the diff() method, but I cannot find the equivalent for columns?
My current solution to obtain a dataframe with the columns differenced for via
df.transpose().diff().transpose()
Is there a more efficient alternative? Or is this such odd usage of pandas that this was just never requested/ considered useful? :)
Thanks,
You can use the DataFrame.diff () function to find the difference between two rows in a pandas DataFrame. periods: The number of previous rows for calculating the difference. axis: Find difference over rows (0) or columns (1). The following examples show how to use this function in practice.
In this tutorial, you’ll learn how to use the Pandas diff method to calculate the difference between rows and between columns. You’ll learn how to use the .diff method to calculate the difference between subsequent rows or between rows of defined intervals (say, every seven rows).
Because of this, we can easily use the shift method to subtract between rows. The Pandas shift method offers a pre-step to calculating the difference between two rows by letting you see the data directly. The Pandas diff method simply calculates the difference, thereby abstracting the calculation.
Calculates the difference of a Dataframe element compared with another element in the Dataframe (default is element in previous row). Periods to shift for calculating difference, accepts negative values. Take difference over rows (0) or columns (1). First differences of the Series. Percent change over given number of periods.
Pandas DataFrames are excellent for manipulating table-like data whose columns have different dtypes.
If subtracting across columns and rows both make sense, then it means all the values are the same kind of quantity. That might be an indication that you should be using a NumPy array instead of a Pandas DataFrame.
In any case, you can use arr = df.values
to extract a NumPy array of the underlying data from the DataFrame. If all the columns share the same dtype, then the NumPy array will have the same dtype. (When the columns have different dtypes, df.values
has object
dtype).
Then you can compute the differences along rows or columns using np.diff(arr, axis=...)
:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(12).reshape(3,4), columns=list('ABCD'))
# A B C D
# 0 0 1 2 3
# 1 4 5 6 7
# 2 8 9 10 11
np.diff(df.values, axis=0) # difference of the rows
# array([[4, 4, 4, 4],
# [4, 4, 4, 4]])
np.diff(df.values, axis=1) # difference of the columns
# array([[1, 1, 1],
# [1, 1, 1],
# [1, 1, 1]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With