Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas filtering with datetime index

Tags:

python

pandas

With a datetime index to a Pandas dataframe, it is easy to get a range of dates:

df[datetime(2018,1,1):datetime(2018,1,10)]

Filtering is straightforward too:

df[ (df['column A'] = 'Done') & (df['column B'] < 3.14 )]

But what is the best way to simultaneously filter by range of dates and any other non-date criteria?

like image 823
stephenb Avatar asked Sep 21 '18 04:09

stephenb


4 Answers

3 boolean conditions

c0 = df.index.to_series().between('2018-01-01', '2018-01-10')
c1 = df['column A'] == 'Done'
c2 = df['column B'] < 3.14

df[c0 & c1 & c2]

           column A  column B
2018-01-04     Done  2.533385
2018-01-06     Done  2.789072
2018-01-08     Done  2.230017

Setup

np.random.seed([3, 1415])
df = pd.DataFrame({
    'column A': ['Done', 'Not Done'] * 10,
    'column B': np.random.randn(20) + np.pi
}, pd.date_range('2017-12-25', periods=20))

df

            column A  column B
2017-12-25      Done  1.011868
2017-12-26  Not Done  1.873127
2017-12-27      Done  1.171093
2017-12-28  Not Done  0.882538
2017-12-29      Done  2.792306
2017-12-30  Not Done  3.114638
2017-12-31      Done  3.457829
2018-01-01  Not Done  3.490375
2018-01-02      Done  3.856957
2018-01-03  Not Done  3.912356
2018-01-04      Done  2.533385
2018-01-05  Not Done  3.493983
2018-01-06      Done  2.789072
2018-01-07  Not Done  2.725724
2018-01-08      Done  2.230017
2018-01-09  Not Done  2.999055
2018-01-10      Done  3.888432
2018-01-11  Not Done  1.637436
2018-01-12      Done  3.752955
2018-01-13  Not Done  3.541812
like image 110
piRSquared Avatar answered Sep 27 '22 23:09

piRSquared


If there is multiple boolean masks is possible use np.logical_and.reduce:

m1 = df.index > '2018-01-01'
m2 = df.index < '2018-01-10'
m3 = df['column A'] == 'Done'
m4 = df['column B'] < 3.14

#piRSquared's data sample
df = df[np.logical_and.reduce([m1, m2, m3, m4])]
print (df)
           column A  column B
2018-01-04     Done  2.533385
2018-01-06     Done  2.789072
2018-01-08     Done  2.230017
like image 20
jezrael Avatar answered Sep 27 '22 23:09

jezrael


import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.random((200,3)))
df['date'] = pd.date_range('2018-1-1', periods=200, freq='D')
df = df.set_index(['date'])
print(df.loc['2018-2-1':'2018-2-10'])

Hope! it will helpful

like image 35
S. Karthick Avatar answered Sep 27 '22 22:09

S. Karthick


I did this below to filter for both dataframes to have the same date

corn_url = 'https://fred.stlouisfed.org/graph/fredgraph.csv?bgcolor=%23e1e9f0&chart_type=line&drp=0&fo=open%20sans&graph_bgcolor=%23ffffff&height=450&mode=fred&recession_bars=on&txtcolor=%23444444&ts=12&tts=12&width=1168&nt=0&thu=0&trc=0&show_legend=yes&show_axis_titles=yes&show_tooltip=yes&id=WPU012202&scale=left&cosd=1971-01-01&coed=2020-04-01&line_color=%234572a7&link_values=false&line_style=solid&mark_type=none&mw=3&lw=2&ost=-99999&oet=99999&mma=0&fml=a&fq=Monthly&fam=avg&fgst=lin&fgsnd=2009-06-01&line_index=1&transformation=lin&vintage_date=2020-06-09&revision_date=2020-06-09&nd=1971-01-01'
wheat_url ='https://fred.stlouisfed.org/graph/fredgraph.csv?bgcolor=%23e1e9f0&chart_type=line&drp=0&fo=open%20sans&graph_bgcolor=%23ffffff&height=450&mode=fred&recession_bars=on&txtcolor=%23444444&ts=12&tts=12&width=1168&nt=0&thu=0&trc=0&show_legend=yes&show_axis_titles=yes&show_tooltip=yes&id=WPU0121&scale=left&cosd=1947-01-01&coed=2020-04-01&line_color=%234572a7&link_values=false&line_style=solid&mark_type=none&mw=3&lw=2&ost=-99999&oet=99999&mma=0&fml=a&fq=Monthly&fam=avg&fgst=lin&fgsnd=2009-06-01&line_index=1&transformation=lin&vintage_date=2020-06-09&revision_date=2020-06-09&nd=1947-01-01'


corn = pd.read_csv(corn_url,index_col=0,parse_dates=True)
wheat = pd.read_csv(wheat_url,index_col=0, parse_dates=True)

    corn.head()
    PP Index 1982
    DATE    
    1971-01-01  63.4
    1971-02-01  63.6
    1971-03-01  62.0
    1971-04-01  60.8
    1971-05-01  60.2

wheat.head()
PP Index 1982
DATE    
1947-01-01  53.1
1947-02-01  56.5
1947-03-01  68.0
1947-04-01  66.0
1947-05-01  66.7


wheat = wheat[wheat.index > '1970-12-31']

wheat.head()
PP Index 1982
DATE    
1971-01-01  42.6
1971-02-01  42.6
1971-03-01  41.4
1971-04-01  41.7
1971-05-01  41.8
like image 39
Puttur Kamath Avatar answered Sep 27 '22 23:09

Puttur Kamath