I have a pandas dataframe with two columns:
ddf.head()
a b
0 3136 13280
1 3072 13312
2 3152 13296
3 3120 13248
4 3120 13200
I would like to calculate the difference between consecutive elements in the same column. Now, if I do it for one column at a time (ddf['a'].diff()
) it works as I expect, but if I try ddf.diff()
it gives:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-68-6ff864856571> in <module>()
----> 1 ddf.diff()
/home/app/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in diff(self, periods)
4285 diffed : DataFrame
4286 """
-> 4287 new_data = self._data.diff(periods)
4288 return self._constructor(new_data)
4289
/home/app/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in diff(self, *args, **kwargs)
1287
1288 def diff(self, *args, **kwargs):
-> 1289 return self.apply('diff', *args, **kwargs)
1290
1291 def interpolate(self, *args, **kwargs):
/home/app/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in apply(self, f, *args, **kwargs)
1267 applied = f(blk, *args, **kwargs)
1268 else:
-> 1269 applied = getattr(blk,f)(*args, **kwargs)
1270
1271 if isinstance(applied,list):
/home/app/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in diff(self, n)
423 def diff(self, n):
424 """ return block for the diff of the values """
--> 425 new_values = com.diff(self.values, n, axis=1)
426 return make_block(new_values, self.items, self.ref_items, fastpath=True)
427
/home/app/anaconda/lib/python2.7/site-packages/pandas/core/common.pyc in diff(arr, n, axis)
643 if arr.ndim == 2 and arr.dtype.name in _diff_special:
644 f = _diff_special[arr.dtype.name]
--> 645 f(arr, out_arr, n, axis)
646 else:
647 res_indexer = [slice(None)] * arr.ndim
/home/app/anaconda/lib/python2.7/site-packages/pandas/algos.so in pandas.algos.diff_2d_int16 (pandas/algos.c:91446)()
ValueError: Buffer dtype mismatch, expected 'float32_t' but got 'double'
Difference between rows or columns of a pandas DataFrame object is found using the diff() method. The axis parameter decides whether difference to be calculated is between rows or between columns.
The diff() method returns a DataFrame with the difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.
How to groupby multiple columns in pandas DataFrame and compute multiple aggregations? groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.
You can use this:
>>> df - df.shift(1)
a b
0 NaN NaN
1 -64 32
2 80 -16
3 -32 -48
4 0 -48
But actually, at my machine, df.diff()
works ok:
>>> df.diff()
a b
0 NaN NaN
1 -64 32
2 80 -16
3 -32 -48
4 0 -48
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With