There is no datetime dtype to be set for read_csv as csv files can only contain strings, integers and floats. Setting a dtype to datetime will make pandas interpret the datetime as an object, meaning you will end up with a string.
No, there is no way in pandas to automatically recognize date columns.
We can use the parse_dates parameter to convince pandas to turn things into real datetime types. parse_dates takes a list of columns (since you could want to parse multiple columns into datetimes ).
There is no datetime dtype to be set for read_csv as csv files can only contain strings, integers and floats.
Setting a dtype to datetime will make pandas interpret the datetime as an object, meaning you will end up with a string.
The pandas.read_csv()
function has a keyword argument called parse_dates
Using this you can on the fly convert strings, floats or integers into datetimes using the default date_parser
(dateutil.parser.parser
)
headers = ['col1', 'col2', 'col3', 'col4']
dtypes = {'col1': 'str', 'col2': 'str', 'col3': 'str', 'col4': 'float'}
parse_dates = ['col1', 'col2']
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes, parse_dates=parse_dates)
This will cause pandas to read col1
and col2
as strings, which they most likely are ("2016-05-05" etc.) and after having read the string, the date_parser for each column will act upon that string and give back whatever that function returns.
The pandas.read_csv()
function also has a keyword argument called date_parser
Setting this to a lambda function will make that particular function be used for the parsing of the dates.
You have to give it the function, not the execution of the function, thus this is Correct
date_parser = pd.datetools.to_datetime
This is incorrect:
date_parser = pd.datetools.to_datetime()
pd.datetools.to_datetime
has been relocated to date_parser = pd.to_datetime
Thanks @stackoverYC
There is a parse_dates
parameter for read_csv
which allows you to define the names of the columns you want treated as dates or datetimes:
date_cols = ['col1', 'col2']
pd.read_csv(file, sep='\t', header=None, names=headers, parse_dates=date_cols)
You might try passing actual types instead of strings.
import pandas as pd
from datetime import datetime
headers = ['col1', 'col2', 'col3', 'col4']
dtypes = [datetime, datetime, str, float]
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes)
But it's going to be really hard to diagnose this without any of your data to tinker with.
And really, you probably want pandas to parse the the dates into TimeStamps, so that might be:
pd.read_csv(file, sep='\t', header=None, names=headers, parse_dates=True)
I tried using the dtypes=[datetime, ...] option, but
import pandas as pd
from datetime import datetime
headers = ['col1', 'col2', 'col3', 'col4']
dtypes = [datetime, datetime, str, float]
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes)
I encountered the following error:
TypeError: data type not understood
The only change I had to make is to replace datetime with datetime.datetime
import pandas as pd
from datetime import datetime
headers = ['col1', 'col2', 'col3', 'col4']
dtypes = [datetime.datetime, datetime.datetime, str, float]
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes)
My workaround was to load as its default type, then use pandas.to_datetime() function one line down.
df[target_col] = pd.to_datetime(df[target_col])
I used the following code and it worked:
headers = ['col1', 'col2', 'col3', 'col4']
df=pd.read_csv(file, sep='\t', header=None, names=headers, parse_dates=['col1', 'col2'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With