I have a series with some datetimes (as strings) and some nulls as 'nan': <pre class="prettyprint"><code>import pandas as pd, numpy as np, datetime as dt df = pd.DataFrame({'Date':['2014-10-20 10:44:31', '2014-10-23 09:33:46', 'nan', '2014-10-01 09:38:45']}) </code></pre> I'm trying to convert these to datetime: <pre class="prettyprint"><code>df['Date'] = df['Date'].apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')) </code></pre> but I get the error: <pre class="prettyprint"><code>time data 'nan' does not match format '%Y-%m-%d %H:%M:%S' </code></pre> So I try to turn these into actual nulls: <pre class="prettyprint"><code>df.ix[df['Date'] == 'nan', 'Date'] = np.NaN </code></pre> and repeat: <pre class="prettyprint"><code>df['Date'] = df['Date'].apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')) </code></pre> but then I get the error: <blockquote> must be string, not float </blockquote> What is the quickest way to solve this problem?

Just use <code>to_datetime</code> and set <code>errors='coerce'</code> to handle duff data: <pre class="prettyprint"><code>In [321]: df['Date'] = pd.to_datetime(df['Date'], errors='coerce') df Out[321]: Date 0 2014-10-20 10:44:31 1 2014-10-23 09:33:46 2 NaT 3 2014-10-01 09:38:45 In [322]: df.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 4 entries, 0 to 3 Data columns (total 1 columns): Date 3 non-null datetime64[ns] dtypes: datetime64[ns](1) memory usage: 64.0 bytes </code></pre> the problem with calling <code>strptime</code> is that it will raise an error if the string, or dtype is incorrect. If you did this then it would work: <pre class="prettyprint"><code>In [324]: def func(x): try: return dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S') except: return pd.NaT df['Date'].apply(func) Out[324]: 0 2014-10-20 10:44:31 1 2014-10-23 09:33:46 2 NaT 3 2014-10-01 09:38:45 Name: Date, dtype: datetime64[ns] </code></pre> but it will be faster to use the inbuilt <code>to_datetime</code> rather than call <code>apply</code> which essentially just loops over your series. timings <pre class="prettyprint"><code>In [326]: %timeit pd.to_datetime(df['Date'], errors='coerce') %timeit df['Date'].apply(func) 10000 loops, best of 3: 65.8 µs per loop 10000 loops, best of 3: 186 µs per loop </code></pre> We see here that using <code>to_datetime</code> is 3X faster.

How to convert string to datetime with nulls - python, pandas?

Tags:

python

string

datetime

type-conversion

pandas

I have a series with some datetimes (as strings) and some nulls as 'nan':

import pandas as pd, numpy as np, datetime as dt df = pd.DataFrame({'Date':['2014-10-20 10:44:31', '2014-10-23 09:33:46', 'nan', '2014-10-01 09:38:45']})

I'm trying to convert these to datetime:

df['Date'] = df['Date'].apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))

but I get the error:

time data 'nan' does not match format '%Y-%m-%d %H:%M:%S'

So I try to turn these into actual nulls:

df.ix[df['Date'] == 'nan', 'Date'] = np.NaN

and repeat:

df['Date'] = df['Date'].apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))

but then I get the error:

must be string, not float

What is the quickest way to solve this problem?

471

asked Mar 27 '15 10:03

Colin O'Brien

1 Answers

Just use to_datetime and set errors='coerce' to handle duff data:

In [321]:  df['Date'] = pd.to_datetime(df['Date'], errors='coerce') df Out[321]:                  Date 0 2014-10-20 10:44:31 1 2014-10-23 09:33:46 2                 NaT 3 2014-10-01 09:38:45  In [322]:  df.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 4 entries, 0 to 3 Data columns (total 1 columns): Date    3 non-null datetime64[ns] dtypes: datetime64[ns](1) memory usage: 64.0 bytes

the problem with calling strptime is that it will raise an error if the string, or dtype is incorrect.

If you did this then it would work:

In [324]:  def func(x):     try:         return dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')     except:         return pd.NaT  df['Date'].apply(func) Out[324]: 0   2014-10-20 10:44:31 1   2014-10-23 09:33:46 2                   NaT 3   2014-10-01 09:38:45 Name: Date, dtype: datetime64[ns]

but it will be faster to use the inbuilt to_datetime rather than call apply which essentially just loops over your series.

timings

In [326]:  %timeit pd.to_datetime(df['Date'], errors='coerce') %timeit df['Date'].apply(func) 10000 loops, best of 3: 65.8 µs per loop 10000 loops, best of 3: 186 µs per loop

We see here that using to_datetime is 3X faster.

answered Sep 21 '22 22:09

EdChum

Related questions
                            
                                What command to use instead of urllib.request.urlretrieve?
                            
                                Simple multi layer neural network implementation [closed]
                            
                                How to get values from dictionary in jinja when key is a variable?
                            
                                How to iterate over each string in a list of strings and operate on it's elements
                            
                                Form validation fails due missing CSRF
                            
                                How to log to journald (systemd) via Python?
                            
                                Calculate sklearn.roc_auc_score for multi-class
                            
                                json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 190) [duplicate]
                            
                                How can I get a specific field of a csv file?
                            
                                What is the syntax for adding a GET parameter to a URL?
                            
                                Using "Counter" in Python 3.2
                            
                                Approximately converting unicode string to ascii string in python
                            
                                how to make hollow square marks with matplotlib in python
                            
                                Set initial value to modelform in class based generic views
                            
                                Decompress bz2 files
                            
                                Django-queryset join without foreignkey
                            
                                I get an Error 400: Bad Request on custom Heroku domain, but works fine on foo.herokuapp.com
                            
                                Interactive pixel information of an image in Python?
                            
                                UDP Client/Server Socket in Python
                            
                                Understanding execute async script in Selenium

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With