Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Query for date

Tags:

python

pandas

I was looking through the pandas.query documentation but couldn't find anything specific about this. Is it possible to perform a query on a date based on the closest date to the one given, instead of a specific date?

For example lets say we use the wine dataset and creates some random dates.

    import pandas as pd
    import numpy as np
    from sklearn import datasets
    dir(datasets)
    df = pd.DataFrame(datasets.load_wine().data)
    df.columns = datasets.load_wine().feature_names
    df.columns=df.columns.str.strip()
    
    
    
    
    def random_dates(start, end, n, unit='D'):
        ndays = (end - start).days + 1
        return pd.to_timedelta(np.random.rand(n) * ndays, unit=unit) + start
    
    
    
    np.random.seed(0)
    start = pd.to_datetime('2015-01-01')
    end = pd.to_datetime('2022-01-01')
    datelist=random_dates(start, end, 178)

    df['Dates'] = datelist

if you perform a simple query on hue

df.query('hue == 0.6')

you'll receive three rows with three random dates. Is it possible to pick the query result that's closest to let's say 2017-1-1?

so something like

df.query('hue==0.6').query('Date ~2017-1-1')

I hope this makes sense!


1 Answers

You can use something like:

df.query("('2018-01-01' < Dates) & (Dates < '2018-01-31')")

# Output
     alcohol  malic_acid  ...  proline                         Dates
6      14.39        1.87  ...   1290.0 2018-01-24 08:21:14.665824000
41     13.41        3.84  ...   1035.0 2018-01-22 22:15:56.547561600
51     13.83        1.65  ...   1265.0 2018-01-26 22:37:26.812156800
131    12.88        2.99  ...    530.0 2018-01-01 18:58:05.118441600
139    12.84        2.96  ...    590.0 2018-01-08 13:38:26.117376000
142    13.52        3.17  ...    520.0 2018-01-19 22:37:10.170825600

[6 rows x 14 columns]

Or using @variables:

date = pd.to_datetime('2018-01-01')
offset = pd.DateOffset(days=10)
start = date - offset
end = date + offset
df.query("Dates.between(@start, @end)")

# Output
     alcohol  malic_acid  ...  proline                         Dates
131    12.88        2.99  ...    530.0 2018-01-01 18:58:05.118441600
139    12.84        2.96  ...    590.0 2018-01-08 13:38:26.117376000
like image 81
Corralien Avatar answered Oct 20 '25 17:10

Corralien