datetime dtypes in pandas read_csv

Why it does not work

There is no datetime dtype to be set for read_csv as csv files can only contain strings, integers and floats.

Setting a dtype to datetime will make pandas interpret the datetime as an object, meaning you will end up with a string.

Pandas way of solving this

The pandas.read_csv() function has a keyword argument called parse_dates

Using this you can on the fly convert strings, floats or integers into datetimes using the default date_parser (dateutil.parser.parser)

headers = ['col1', 'col2', 'col3', 'col4']
dtypes = {'col1': 'str', 'col2': 'str', 'col3': 'str', 'col4': 'float'}
parse_dates = ['col1', 'col2']
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes, parse_dates=parse_dates)

This will cause pandas to read col1 and col2 as strings, which they most likely are ("2016-05-05" etc.) and after having read the string, the date_parser for each column will act upon that string and give back whatever that function returns.

Defining your own date parsing function:

The pandas.read_csv() function also has a keyword argument called date_parser

Setting this to a lambda function will make that particular function be used for the parsing of the dates.

GOTCHA WARNING

You have to give it the function, not the execution of the function, thus this is Correct

date_parser = pd.datetools.to_datetime

This is incorrect:

date_parser = pd.datetools.to_datetime()

Pandas 0.22 Update

pd.datetools.to_datetime has been relocated to date_parser = pd.to_datetime

Thanks @stackoverYC

There is a parse_dates parameter for read_csv which allows you to define the names of the columns you want treated as dates or datetimes:

date_cols = ['col1', 'col2']
pd.read_csv(file, sep='\t', header=None, names=headers, parse_dates=date_cols)

You might try passing actual types instead of strings.

import pandas as pd
from datetime import datetime
headers = ['col1', 'col2', 'col3', 'col4'] 
dtypes = [datetime, datetime, str, float] 
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes)

But it's going to be really hard to diagnose this without any of your data to tinker with.

And really, you probably want pandas to parse the the dates into TimeStamps, so that might be:

pd.read_csv(file, sep='\t', header=None, names=headers, parse_dates=True)

I tried using the dtypes=[datetime, ...] option, but

import pandas as pd
from datetime import datetime
headers = ['col1', 'col2', 'col3', 'col4'] 
dtypes = [datetime, datetime, str, float] 
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes)

I encountered the following error:

TypeError: data type not understood

The only change I had to make is to replace datetime with datetime.datetime

import pandas as pd
from datetime import datetime
headers = ['col1', 'col2', 'col3', 'col4'] 
dtypes = [datetime.datetime, datetime.datetime, str, float] 
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes)

My workaround was to load as its default type, then use pandas.to_datetime() function one line down.

df[target_col] = pd.to_datetime(df[target_col])

I used the following code and it worked:

headers = ['col1', 'col2', 'col3', 'col4']
df=pd.read_csv(file, sep='\t', header=None, names=headers, parse_dates=['col1', 'col2'])

Related questions
                            
                                Using multiple arguments for string formatting in Python (e.g., '%s ... %s')
                            
                                What is the reason for performing a double fork when creating a daemon?
                            
                                Add missing dates to pandas dataframe
                            
                                Log exception with traceback in python
                            
                                Convert Unicode to ASCII without errors in Python
                            
                                Replacing column values in a pandas DataFrame
                            
                                Python to print out status bar and percentage
                            
                                Python UTC datetime object's ISO format doesn't include Z (Zulu or Zero offset)
                            
                                Finding the source code for built-in Python functions?
                            
                                Why is printing to stdout so slow? Can it be sped up?
                            
                                Ordering of batch normalization and dropout?
                            
                                How can I use if/else in a dictionary comprehension?
                            
                                Pandas DataFrame column to list [duplicate]
                            
                                Convert Python program to C/C++ code?
                            
                                Are lists thread-safe?
                            
                                Requests -- how to tell if you're getting a 404
                            
                                Random number between 0 and 1 in python [duplicate]
                            
                                Python json.loads shows ValueError: Extra data
                            
                                argparse: identify which subparser was used [duplicate]
                            
                                Beautiful Soup and extracting a div and its contents by ID

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

datetime dtypes in pandas read_csv

Tags:

python

datetime

pandas

dataframe

csv

People also ask