Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Select all data from Pandas DataFrame between two dates

I have DataFrame in which there is a column with event dates (dates are not unique). I need to select all the data that is in this period. I try next:

start_day = datetime.date(datetime.strptime(start_day, '%d.%m.%Y')) #change user data to date format

end_day = datetime.date(datetime.strptime(end_day, '%d.%m.%Y'))

df = df[df['Transaction_date'].between(start_day, end_day)]

The type of column Transaction_date is datetime64[ns]. When I run the code with the request for the period 01/01/2020 to 31/01/2020 - part of the data for the specified period is lost (information is displayed only for 21 days, it should be for 31 days).

I see that the Data Frame is contained data for the entire requested period. Can you help me please, where is my mistake?

like image 757
Maksim Kostinov Avatar asked Feb 27 '20 21:02

Maksim Kostinov


People also ask

How can I find the difference between two dates in Pandas?

Use df. dates1-df. dates2 to find the difference between the two dates and then convert the result in the form of months.

How do you Subset two dates in Python?

There are two possible solutions: Use a boolean mask, then use df. loc[mask] Set the date column as a DatetimeIndex, then use df[start_date : end_date]


1 Answers

I used the following test DataFrame:

   Transaction_date  Value
0        2019-12-31     10
1        2020-01-10     10
2        2020-01-15     10
3        2020-01-20     10
4        2020-01-25     10
5        2020-01-28     10
6        2020-01-29     10
7        2020-01-30     10
8        2020-01-31     10
9        2020-02-01     10
10       2020-02-01     10

Start / end dates are:

start_day = '01.01.2020'
end_day = '31.01.2020'

The code is:

# Convert start / end dates to datetime
start_day = pd.to_datetime(start_day)
end_day = pd.to_datetime(end_day)

I noticed that you use datetime module, which I think is a bad practice. To this end use dedicated pandasonic function (to_datetime). It is even clever enough to recognize many of commonly used date formats, so there is no need to specify it on your own.

To get the actual result, run:

df[df['Transaction_date'].between(start_day, end_day)]

The result is:

  Transaction_date  Value
1       2020-01-10     10
2       2020-01-15     10
3       2020-01-20     10
4       2020-01-25     10
5       2020-01-28     10
6       2020-01-29     10
7       2020-01-30     10
8       2020-01-31     10

So as you can see:

  • The date from 2019 has been eliminated.
  • The output contains only January dates, to the very end.
  • Dates from the beginning of February have also been eliminated.

Repeat my example on your computer. Maybe the source of your problem (that the result dous not contain all dates from January) is somewhere else?

like image 152
Valdi_Bo Avatar answered Sep 29 '22 22:09

Valdi_Bo