Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas filtering and comparing dates

Tags:

python

pandas

I have a sql file which consists of the data below which I read into pandas.

df = pandas.read_sql('Database count details', con=engine,
                     index_col='id', parse_dates='newest_available_date')

Output

id       code   newest_date_available
9793708  3514   2015-12-24
9792282  2399   2015-12-25
9797602  7452   2015-12-25
9804367  9736   2016-01-20
9804438  9870   2016-01-20

The next line of code is to get last week's date

date_before = datetime.date.today() - datetime.timedelta(days=7) # Which is 2016-01-20

What I am trying to do is, to compare date_before with df and print out all rows that is less than date_before

if (df['newest_available_date'] < date_before): print(#all rows)

Obviously this returns me an error The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How should I do this?

like image 892
jake wong Avatar asked Mar 19 '16 16:03

jake wong


People also ask

How do I filter a Dataframe based on date in pandas?

Filter a Dataframe Based on Dates Pandas also makes it very easy to filter on dates. You can filter on specific dates, or on any of the date selectors that Pandas makes available. If you want to filter on a specific date (or before/after a specific date), simply include that in your filter query like above:

How to use pandas to_DATETIME() function?

Pandas to_datetime () function allows converting the date and time in string format to datetime64. This datatype helps extract features of date and time ranging from ‘year’ to ‘microseconds’. To filter rows based on dates, first format the dates in the DataFrame to datetime64 type.

How to filter based on more than one condition in pandas?

Check out some other Python tutorials on datagy, including our complete guide to styling Pandas and our comprehensive overview of Pivot Tables in Pandas! If you want to filter based on more than one condition, you can use the ampersand (&) operator or the pipe (|) operator, for and and or respectively. Let’s try an example.

How to compare the value with each row in pandas?

You can use pd.Timestamp in order to construct your dates and compare the value with each row. The syntax for creating a date with Pandas is: So the comparison will be:


3 Answers

I would do a mask like:

a = df[df['newest_date_available'] < date_before]

If date_before = datetime.date(2016, 1, 19), this returns:

        id  code newest_date_available
0  9793708  3514            2015-12-24
1  9792282  2399            2015-12-25
2  9797602  7452            2015-12-25
like image 117
Fabio Lamanna Avatar answered Oct 01 '22 10:10

Fabio Lamanna


Using datetime.date(2019, 1, 10) works because pandas coerce the date to a date time under the hood. This however, will no longer be the case in future versions of pandas.

From version 0.24 and up, it now issue a warning:

FutureWarning: Comparing Series of datetimes with 'datetime.date'. Currently, the 'datetime.date' is coerced to a datetime. In the future pandas will not coerce, and a TypeError will be raised.

The better solution is the one proposed on its official documentation as Pandas replacement for python datetime.datetime object.

To provide an example referencing OP's initial dataset, this is how you would use it:

import pandas
cond1 = df.newest_date_available < pd.Timestamp(2016,1,10)
df.loc[cond1, ]
like image 36
onlyphantom Avatar answered Oct 01 '22 12:10

onlyphantom


A bit late to the party but I think it is worth mentioning. If you are looking for a solution which dynamically considers the date a week ago, this might be helpful:

In [3]: df = pd.DataFrame({'alpha': list('ABCDE'), 'num': range(5), 'date': pd.date_range('2022-06-30', '2022-07-04')})

In [4]: df
Out[4]: 
  alpha  num       date
0     A    0 2022-06-30
1     B    1 2022-07-01
2     C    2 2022-07-02
3     D    3 2022-07-03
4     E    4 2022-07-04

In [5]: df.query('date < "%s"' % (pd.Timestamp.now().normalize() - pd.Timedelta(7, 'd')))
Out[5]: 
  alpha  num       date
0     A    0 2022-06-30
1     B    1 2022-07-01

Explanation:
I created a new df with newer dates. Today is 2022-07-09 (pd.Timestamp.now().normalize()) and seven days ago it was 2022-07-02 (pd.Timestamp.now().normalize() - pd.Timedelta(7, 'd')). query() returns only those observations where the dates in column date are smaller than 2022-07-02 using the string formatting operator %.
normalize() is important here to reset the time to midnight. Otherwise query() will also return observations equal to 2022-07-02, because:

# Timestamp('2022-07-09 17:53:03.078172') > Timestamp('2022-07-09 00:00:00')
In [6]: pd.Timestamp.now() > pd.Timestamp.now().normalize()
Out[6]: True
like image 37
rachwa Avatar answered Oct 01 '22 12:10

rachwa