I'm reading in csv files from an external data source using pd.read_csv
, as in the code below:
pd.read_csv(
BytesIO(raw_data),
parse_dates=['dates'],
date_parser=np.datetime64,
)
However, somewhere in the csv that's being sent, there is a misformatted date, resulting in the following error:
ValueError: Error parsing datetime string "2015-08-2" at position 8
This causes the entire application to crash. Of course, I can handle this case with a try/except, but then I will lose all the other data in that particular csv. I need pandas to keep and parse that other data.
I have no way of predicting when/where this data (which changes daily) will have badly formatted dates. Is there some way to get pd.read_csv
to skip only the rows with bad dates but to still parse all the other rows in the csv?
somewhere in the csv that's being sent, there is a misformatted date
np.datetime64
needs ISO8601 formatted strings to work properly. The good news is that you can wrap np.datetime64
in your own function and use this as the date_parser
:
def parse_date(v):
try:
return np.datetime64(v)
except:
# apply whatever remedies you deem appropriate
pass
return v
pd.read_csv(
...
date_parser=parse_date
)
I need pandas to keep and parse that other data.
I often find that a more flexible date parser like dateutil
works better than np.datetime64
and may even work without the extra function:
import dateutil
pd.read_csv(
BytesIO(raw_data),
parse_dates=['dates'],
date_parser=dateutil.parser.parse,
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With