Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Diff on pandas dataframe with more than one column

Tags:

python

pandas

I have a pandas dataframe with two columns:

ddf.head()

    a    b
0   3136 13280
1   3072 13312
2   3152 13296
3   3120 13248
4   3120 13200

I would like to calculate the difference between consecutive elements in the same column. Now, if I do it for one column at a time (ddf['a'].diff()) it works as I expect, but if I try ddf.diff() it gives:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-68-6ff864856571> in <module>()
----> 1 ddf.diff()

/home/app/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in diff(self, periods)
   4285         diffed : DataFrame
   4286         """
-> 4287         new_data = self._data.diff(periods)
   4288         return self._constructor(new_data)
   4289 

/home/app/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in diff(self, *args, **kwargs)
   1287 
   1288     def diff(self, *args, **kwargs):
-> 1289         return self.apply('diff', *args, **kwargs)
   1290 
   1291     def interpolate(self, *args, **kwargs):

/home/app/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in apply(self, f, *args, **kwargs)
   1267                 applied = f(blk, *args, **kwargs)
   1268             else:
-> 1269                 applied = getattr(blk,f)(*args, **kwargs)
   1270 
   1271             if isinstance(applied,list):

/home/app/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in diff(self, n)
    423     def diff(self, n):
    424         """ return block for the diff of the values """
--> 425         new_values = com.diff(self.values, n, axis=1)
    426         return make_block(new_values, self.items, self.ref_items, fastpath=True)
    427 

/home/app/anaconda/lib/python2.7/site-packages/pandas/core/common.pyc in diff(arr, n, axis)
    643     if arr.ndim == 2 and arr.dtype.name in _diff_special:
    644         f = _diff_special[arr.dtype.name]
--> 645         f(arr, out_arr, n, axis)
    646     else:
    647         res_indexer = [slice(None)] * arr.ndim

/home/app/anaconda/lib/python2.7/site-packages/pandas/algos.so in pandas.algos.diff_2d_int16 (pandas/algos.c:91446)()

ValueError: Buffer dtype mismatch, expected 'float32_t' but got 'double'
like image 630
Fra Avatar asked Nov 12 '13 21:11

Fra


People also ask

How do you find the difference between two columns in pandas?

Difference between rows or columns of a pandas DataFrame object is found using the diff() method. The axis parameter decides whether difference to be calculated is between rows or between columns.

What does diff () do in pandas?

The diff() method returns a DataFrame with the difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.

Can you use Groupby with multiple columns in pandas?

How to groupby multiple columns in pandas DataFrame and compute multiple aggregations? groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.


1 Answers

You can use this:

>>> df - df.shift(1)
    a   b
0 NaN NaN
1 -64  32
2  80 -16
3 -32 -48
4   0 -48

But actually, at my machine, df.diff() works ok:

>>> df.diff()
    a   b
0 NaN NaN
1 -64  32
2  80 -16
3 -32 -48
4   0 -48
like image 148
Roman Pekar Avatar answered Sep 30 '22 11:09

Roman Pekar