I want to use numpy.diff on a pandas Series. Am I right that this is a bug? Or am I doing it wrong?
In [163]: s = Series(np.arange(10))
In [164]: np.diff(s)
Out[164]:
0 NaN
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 NaN
In [165]: np.diff(np.arange(10))
Out[165]: array([1, 1, 1, 1, 1, 1, 1, 1, 1])
I am using pandas 0.9.1rc1, numpy 1.6.1.
Pandas Series: diff() function Calculates the difference of a Series element compared with another element in the Series (default is element in previous row). Periods to shift for calculating difference, accepts negative values.
diff(arr[, n[, axis]]) function is used when we calculate the n-th order discrete difference along the given axis. The first order difference is given by out[i] = arr[i+1] – arr[i] along the given axis. If we have to calculate higher differences, we are using diff recursively.
The diff() method returns a DataFrame with the difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.
Pandas implements diff
like so:
In [3]: s = pd.Series(np.arange(10))
In [4]: s.diff()
Out[4]:
0 NaN
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 1
Using np.diff
directly:
In [7]: np.diff(s.values)
Out[7]: array([1, 1, 1, 1, 1, 1, 1, 1, 1])
In [8]: np.diff(np.array(s))
Out[8]: array([1, 1, 1, 1, 1, 1, 1, 1, 1])
So why doesn't np.diff(s)
work? Because np is taking np.asanyarray()
of the series before finding the diff
. Like so:
In [25]: a = np.asanyarray(s)
In [26]: a
Out[26]:
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
In [27]: np.diff(a)
Out[27]:
0 NaN
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With