Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas filtering - between_time on a non-index column

Tags:

python

pandas

I need to filter out data with specific hours. The DataFrame function between_time seems to be the proper way to do that, however, it only works on the index column of the dataframe; but I need to have the data in the original format (e.g. pivot tables will expect the datetime column to be with the proper name, not as the index).

This means that each filter looks something like this:

df.set_index(keys='my_datetime_field').between_time('8:00','21:00').reset_index()

Which implies that there are two reindexing operations every time such a filter is run.

Is this a good practice or is there a more appropriate way to do the same thing?

like image 631
Peteris Avatar asked Jan 21 '15 10:01

Peteris


1 Answers

Create a DatetimeIndex, but store it in a variable, not the DataFrame. Then call it's indexer_between_time method. This returns an integer array which can then be used to select rows from df using iloc:

import pandas as pd
import numpy as np

N = 100
df = pd.DataFrame(
    {'date': pd.date_range('2000-1-1', periods=N, freq='H'),
     'value': np.random.random(N)})

index = pd.DatetimeIndex(df['date'])
df.iloc[index.indexer_between_time('8:00','21:00')]
like image 60
unutbu Avatar answered Oct 07 '22 00:10

unutbu