Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas: how to calculate derivative/gradient

Tags:

Given that I have the following two vectors:

In [99]: time_index Out[99]:  [1484942413,  1484942712,  1484943012,  1484943312,  1484943612,  1484943912,  1484944212,  1484944511,  1484944811,  1484945110]  In [100]: bytes_in Out[100]:  [1293981210388,  1293981379944,  1293981549960,  1293981720866,  1293981890968,  1293982062261,  1293982227492,  1293982391244,  1293982556526,  1293982722320] 

Where bytes_in is an incremental only counter, and time_index is a list to unix timestamps (epoch).

Objective: What I would like to calculate is the bitrate.

That means that I will build a data frame like

In [101]: timeline = pandas.to_datetime(time_index, unit="s")  In [102]: recv = pandas.Series(bytes_in, timeline).resample("300S").mean().ffill().apply(lambda i: i*8)  In [103]: recv Out[103]:  2017-01-20 20:00:00    10351849683104 2017-01-20 20:05:00    10351851039552 2017-01-20 20:10:00    10351852399680 2017-01-20 20:15:00    10351853766928 2017-01-20 20:20:00    10351855127744 2017-01-20 20:25:00    10351856498088 2017-01-20 20:30:00    10351857819936 2017-01-20 20:35:00    10351859129952 2017-01-20 20:40:00    10351860452208 2017-01-20 20:45:00    10351861778560 Freq: 300S, dtype: int64 

Question: Now, what is strange, calculating the gradient manually gives me :

In [104]: (bytes_in[1]-bytes_in[0])*8/300 Out[104]: 4521.493333333333 

which is the correct value ..

while calculating the gradient with pandas gives me

In [124]: recv.diff() Out[124]:  2017-01-20 20:00:00          NaN 2017-01-20 20:05:00    1356448.0 2017-01-20 20:10:00    1360128.0 2017-01-20 20:15:00    1367248.0 2017-01-20 20:20:00    1360816.0 2017-01-20 20:25:00    1370344.0 2017-01-20 20:30:00    1321848.0 2017-01-20 20:35:00    1310016.0 2017-01-20 20:40:00    1322256.0 2017-01-20 20:45:00    1326352.0 Freq: 300S, dtype: float64 

which is not the same as above, 1356448.0 is different than 4521.493333333333

Could you please enlighten on what I am doing wrong ?

like image 844
nskalis Avatar asked Jan 21 '17 14:01

nskalis


People also ask

What does diff () do in pandas?

Pandas DataFrame diff() Method The diff() method returns a DataFrame with the difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.

What is diff () in Python?

diff(arr[, n[, axis]]) function is used when we calculate the n-th order discrete difference along the given axis. The first order difference is given by out[i] = arr[i+1] – arr[i] along the given axis. If we have to calculate higher differences, we are using diff recursively. Syntax: numpy.diff()

How do you find the difference between two rows in pandas?

During data analysis, one might need to compute the difference between two rows for comparison purposes. This can be done using pandas. DataFrame. diff() function.

How do you find the difference between two columns in pandas?

Difference between rows or columns of a pandas DataFrame object is found using the diff() method. The axis parameter decides whether difference to be calculated is between rows or between columns. When the periods parameter assumes positive values, difference is found by subtracting the previous row from the next row.


1 Answers

pd.Series.diff() only takes the differences. It doesn't divide by the delta of the index as well.

This gets you the answer

recv.diff() / recv.index.to_series().diff().dt.total_seconds()  2017-01-20 20:00:00            NaN 2017-01-20 20:05:00    4521.493333 2017-01-20 20:10:00    4533.760000 2017-01-20 20:15:00    4557.493333 2017-01-20 20:20:00    4536.053333 2017-01-20 20:25:00    4567.813333 2017-01-20 20:30:00    4406.160000 2017-01-20 20:35:00    4366.720000 2017-01-20 20:40:00    4407.520000 2017-01-20 20:45:00    4421.173333 Freq: 300S, dtype: float64 

You could also use numpy.gradient passing the bytes_in and the delta you expect to have. This will not reduce the length by one, instead making assumptions about the edges.

np.gradient(bytes_in, 300) * 8  array([ 4521.49333333,  4527.62666667,  4545.62666667,  4546.77333333,         4551.93333333,  4486.98666667,  4386.44      ,  4387.12      ,         4414.34666667,  4421.17333333]) 
like image 130
piRSquared Avatar answered Sep 28 '22 02:09

piRSquared