Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Time-weighted average with Pandas

What's the most efficient way to calculate the time-weighted average of a TimeSeries in Pandas 0.8? For example, say I want the time-weighted average of df.y - df.x as created below:

import pandas
import numpy as np
times = np.datetime64('2012-05-31 14:00') + np.timedelta64(1, 'ms') * np.cumsum(10**3 * np.random.exponential(size=10**6))
x = np.random.normal(size=10**6)
y = np.random.normal(size=10**6)
df = pandas.DataFrame({'x': x, 'y': y}, index=times)

I feel like this operation should be very easy to do, but everything I've tried involves several messy and slow type conversions.

like image 741
user2303 Avatar asked May 31 '12 19:05

user2303


People also ask

How do you calculate weighted average for pandas?

Calculate a Weighted Average in Pandas Using NumpyThe numpy library has a function, average() , which allows us to pass in an optional argument to specify weights of values. The function will take an array into the argument a= , and another array for weights under the argument weights= .

What is time weighted average?

A time-weighted average is equal to the sum of the portion of each time period (as a decimal, such as 0.25 hour) multiplied by the levels of the substance or agent during the time period divided by the hours in the workday (usually 8 hours).

What is pandas time period?

3) Time Periods: Time Periods references a specific length of time between a start and end timestamp which is invariable and does not overlap. The Period class takes the Period type which takes a string or an integer and encodes a fixed frequency based on numpy. datetime64.


1 Answers

You can convert df.index to integers and use that to compute the average. There is a shortcut asi8 property that returns an array of int64 values:

np.average(df.y - df.x, weights=df.index.asi8)
like image 95
Wes McKinney Avatar answered Oct 18 '22 22:10

Wes McKinney