The problem is in line 22 :
if start_date <= data_entries.iloc[j, 1] <= end_date:
where I want to compare the start_date
and end_date
portion to data_entries.iloc[j, 1]
which is accessing a column of the pandas dataframe. I converted the column to datetime using,
data_entries['VOUCHER DATE'] = pd.to_datetime(data_entries['VOUCHER DATE'], format="%m/%d/%Y")
But I am unsure how to convert it to date.
import pandas as pd
import datetime
entries_csv = "C:\\Users\\Pops\\Desktop\\Entries.csv"
data_entries = pd.read_csv(entries_csv)
data_entries['VOUCHER DATE'] = pd.to_datetime(data_entries['VOUCHER DATE'], format="%m/%d/%Y")
start_date = datetime.date(2018, 4, 1)
end_date = datetime.date(2018, 10, 30)
for j in range(0, len(data_entries)):
if start_date <= data_entries.iloc[j, 1] <= end_date:
print('Hello')
Just use pd.Timestamp
objects without any conversion:
start_date = pd.Timestamp('2018-04-01')
end_date = pd.Timestamp('2018-10-30')
res = data_entries[data_entries['VOUCHER DATE'].between(start_date, end_date)]
Explanation
Don't use datetime.datetime
or datetime.date
objects in Pandas series. This is inefficient because you lose vectorised functionality. The benefit of pd.Timestamp
objects is you can utilize vectorised functionality for calculations. As described here:
numpy.datetime64
is essentially a thin wrapper an int64. It has almost no date/time specific functionality.
pd.Timestamp
is a wrapper around a numpy.datetime64. It is backed by the same int64 value, but supports the entiredatetime.datetime
interface, along with useful pandas-specific functionality.
this converts it to date:
data_entries['VOUCHER DATE'] = pd.to_datetime(data_entries['VOUCHER DATE'], format="%m/%d/%Y").dt.date
however i would not recommend filtering like this. this is much faster
data_entries[data_entries['VOUCHER DATE'].between(start_date, end_date)]
read this article
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With