How can I approximate the periodicity of a pandas time Series

Q: How is the length of a panda series determined?

len() method is used to determine length of each string in a Pandas series. This method is only for series of strings.

Q: How do you get monthly averages in pandas?

Sum all the values for each day present in that month. Divide by the number of days with data for that month.

Q: How do you calculate rate of change in pandas?

The pct_change() method of DataFrame class in pandas computes the percentage change between the rows of data. Note that, the pct_change() method calculates the percentage change only between the rows of data and not between the columns.

Q: Is pandas good for time series?

While the time series tools provided by Pandas tend to be the most useful for data science applications, it is helpful to see their relationship to other packages used in Python.

Tags:

python

pandas

Is there a way to approximate the periodicity of a time series in pandas? For R, the xts objects have a method called periodicity that serves exactly this purpose. Is there an implemented method to do so?

For instance, can we infer the frequency from time series that do not specify frequency?

import pandas.io.data as web
aapl = web.get_data_yahoo("AAPL")

<class 'pandas.tseries.index.DatetimeIndex'>
[2010-01-04 00:00:00, ..., 2013-12-19 00:00:00]
Length: 999, Freq: None, Timezone: None

The frequency of this series can reasonably be approximated to be daily.

Update:

I think it might be helpful to show the source code of R's implementation of the periodicity method.

function (x, ...) 
{
    if (timeBased(x) || !is.xts(x)) 
        x <- try.xts(x, error = "'x' needs to be timeBased or xtsible")
    p <- median(diff(.index(x)))
    if (is.na(p)) 
        stop("can not calculate periodicity of 1 observation")
    units <- "days"
    scale <- "yearly"
    label <- "year"
    if (p < 60) {
        units <- "secs"
        scale <- "seconds"
        label <- "second"
    }
    else if (p < 3600) {
        units <- "mins"
        scale <- "minute"
        label <- "minute"
        p <- p/60L
    }
    else if (p < 86400) {
        units <- "hours"
        scale <- "hourly"
        label <- "hour"
    }
    else if (p == 86400) {
        scale <- "daily"
        label <- "day"
    }
    else if (p <= 604800) {
        scale <- "weekly"
        label <- "week"
    }
    else if (p <= 2678400) {
        scale <- "monthly"
        label <- "month"
    }
    else if (p <= 7948800) {
        scale <- "quarterly"
        label <- "quarter"
    }
    structure(list(difftime = structure(p, units = units, class = "difftime"), 
        frequency = p, start = start(x), end = end(x), units = units, 
        scale = scale, label = label), class = "periodicity")
}

I think this line is the key, which I don't quite understand p <- median(diff(.index(x)))

296

asked Dec 20 '13 20:12

zsljulius

2 Answers

This time series skips weekends (and holidays), so it really doesn't have a daily frequency to begin with. You could use asfreq to upsample it to a time series with daily frequency, however:

aapl = aapl.asfreq('D', method='ffill')

Doing so propagates forward the last observed value to dates with missing values.

Note that Pandas also has a business day frequency, so it is also possible to upsample to business days by using:

aapl = aapl.asfreq('B', method='ffill')

If you wish to automate the process of inferring the median frequency in days, then you could do this:

import pandas as pd
import numpy as np
import pandas.io.data as web
aapl = web.get_data_yahoo("AAPL")
f  = np.median(np.diff(aapl.index.values))
days = f.astype('timedelta64[D]').item().days
aapl = aapl.asfreq('{}D'.format(days), method='ffill')
print(aapl)

This code needs testing, but perhaps it comes close to the R code you posted:

import pandas as pd
import numpy as np
import pandas.io.data as web

def infer_freq(ts):
    med  = np.median(np.diff(ts.index.values))
    seconds = int(med.astype('timedelta64[s]').item().total_seconds())
    if seconds < 60:
        freq = '{}s'.format(seconds)
    elif seconds < 3600:
        freq = '{}T'.format(seconds//60)
    elif seconds < 86400:
        freq = '{}H'.format(seconds//3600)
    elif seconds < 604800:
        freq = '{}D'.format(seconds//86400)
    elif seconds < 2678400:
        freq = '{}W'.format(seconds//604800)
    elif seconds < 7948800:
        freq = '{}M'.format(seconds//2678400)
    else:
        freq = '{}Q'.format(seconds//7948800)
    return ts.asfreq(freq, method='ffill')

aapl = web.get_data_yahoo("AAPL")
print(infer_freq(aapl))

135

answered Oct 31 '22 14:10

unutbu

I don't know about frequency, the only meaningful measure I can come up with is mean timedelta, for example in days:

>>> import numpy as np
>>> idx = aapl.index.values
>>> (np.roll(idx, -1) - idx)[:-1].mean()/np.timedelta64(1, 'D')
1.4478957915831596

or in hours:

>>> (np.roll(idx, -1) - idx)[:-1].mean()/np.timedelta64(1, 'h')
34.749498997995836

The same with a more pandorable expression, kudos to @DSM:

>>> aapl.index.to_series().diff().mean() / (60*60*10**9)
34.749498997995993

Sure median will be 24 hours, as most of days exist in list:

>>> aapl.index.to_series().diff().median() / (60*60*10**9)
24.0

answered Oct 31 '22 15:10

alko

Related questions
                            
                                Flask-Admin upload and insert in database automatically
                            
                                How write csv file without new line character in last line?
                            
                                Django:No module named django.core.management
                            
                                Print progress of pool.map_async
                            
                                Drawing grid pattern in matplotlib
                            
                                Efficient way to round to arbitrary precision in Python [closed]
                            
                                Custom Scheduler to have sequential + semi-sequential scripts with timeouts/kill switches?
                            
                                Given a pickle dump in python how to I determine the used protocol?
                            
                                pandas: selecting array of index labels with .loc
                            
                                What is "python -d" for?
                            
                                how to pass from numpy matrix to numpy array?
                            
                                Change QLabel text dynamically in PyQt4
                            
                                Animation based on only updating colours in a plot
                            
                                How to most efficiently detect file add/delete/rename changes of a directory in Python?
                            
                                how to retrive values form RawQuerySet in django?
                            
                                How to get the size of typed memoryviews in cython
                            
                                multiprocessing - reading big input data - program hangs
                            
                                How do I setup virtualenv environments for Python 2.4 and 2.5 versions on Windows?
                            
                                Two instances of the same Python module?
                            
                                Calling multiple iterators on xrange objects

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With