Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Checking if any date in a python list of dates is between two date columns

Tags:

I have a dataframe with two columns. One is a START_DATE and one is an END_DATE. I also have a python list of dates. I want a third column in the dataframe that indicates if any of the dates in my python list falls between the START_DATE and END_DATE in the dataframe for each particular row. If there is a date in the dates list between the START_DATE and END_DATE, the third column should show true.

dates_list = ['2019-01-06', '2019-04-08']

START_DATE|END_DATE|
____________________
2019-01-01|2019-01-12
2019-01-03|2019-01-05
2019-04-03|2019-04-09

I want a third column showing

TRUE
FALSE
TRUE

It would be great to have in pyspark, but pandas works too.

like image 251
wbarts Avatar asked Jul 11 '19 15:07

wbarts


People also ask

How can I get a list of dates between two dates?

We can get the dates between two dates with single method call using the dedicated datesUntil method of a LocalDate class. The datesUntill returns the sequentially ordered Stream of dates starting from the date object whose method is called to the date given as method argument.

How do you check if a date is between two dates pandas?

You can use pandas. Series. between() method to select DataFrame rows between two dates. This method returns a boolean vector representing whether series element lies in the specified range or not.

How can I find the difference between two datetime columns in pandas?

There are several ways to calculate the time difference between two dates in Python using Pandas. The first is to subtract one date from the other. This returns a timedelta such as 0 days 05:00:00 that tells us the number of days, hours, minutes, and seconds between the two dates.


1 Answers

This could be done using pd.IntervalIndex. Let's start by converting all dates to datetime:

from datetime import datetime
df = df.apply(pd.to_datetime)
dates = [datetime.strptime(x, '%Y-%m-%d') for x in dates_list]

Now let's build a pd.IntervalIndex using its from_arrays method, and check which intervals contain any date from the list using a list comprehension:

ix = pd.IntervalIndex.from_arrays(df['START_DATE'],df['END_DATE'],closed='both')
[any(date in i for date in dates) for i in ix]
# [True, False, True]
like image 90
yatu Avatar answered Oct 02 '22 00:10

yatu