I have a series with some datetimes (as strings) and some nulls as 'nan': <pre class="prettyprint"><code>import pandas as pd, numpy as np, datetime as dt df = pd.DataFrame({'Date':['2014-10-20 10:44:31', '2014-10-23 09:33:46', 'nan', '2014-10-01 09:38:45']}) </code></pre> I'm trying to convert these to datetime: <pre class="prettyprint"><code>df['Date'] = df['Date'].apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')) </code></pre> but I get the error: <pre class="prettyprint"><code>time data 'nan' does not match format '%Y-%m-%d %H:%M:%S' </code></pre> So I try to turn these into actual nulls: <pre class="prettyprint"><code>df.ix[df['Date'] == 'nan', 'Date'] = np.NaN </code></pre> and repeat: <pre class="prettyprint"><code>df['Date'] = df['Date'].apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')) </code></pre> but then I get the error: <blockquote> must be string, not float </blockquote> What is the quickest way to solve this problem?

Just use <code>to_datetime</code> and set <code>errors='coerce'</code> to handle duff data: <pre class="prettyprint"><code>In [321]: df['Date'] = pd.to_datetime(df['Date'], errors='coerce') df Out[321]: Date 0 2014-10-20 10:44:31 1 2014-10-23 09:33:46 2 NaT 3 2014-10-01 09:38:45 In [322]: df.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 4 entries, 0 to 3 Data columns (total 1 columns): Date 3 non-null datetime64[ns] dtypes: datetime64[ns](1) memory usage: 64.0 bytes </code></pre> the problem with calling <code>strptime</code> is that it will raise an error if the string, or dtype is incorrect. If you did this then it would work: <pre class="prettyprint"><code>In [324]: def func(x): try: return dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S') except: return pd.NaT df['Date'].apply(func) Out[324]: 0 2014-10-20 10:44:31 1 2014-10-23 09:33:46 2 NaT 3 2014-10-01 09:38:45 Name: Date, dtype: datetime64[ns] </code></pre> but it will be faster to use the inbuilt <code>to_datetime</code> rather than call <code>apply</code> which essentially just loops over your series. timings <pre class="prettyprint"><code>In [326]: %timeit pd.to_datetime(df['Date'], errors='coerce') %timeit df['Date'].apply(func) 10000 loops, best of 3: 65.8 µs per loop 10000 loops, best of 3: 186 µs per loop </code></pre> We see here that using <code>to_datetime</code> is 3X faster.

How to convert string to datetime with nulls

I have a series with some datetimes (as strings) and some nulls as 'nan':

import pandas as pd, numpy as np, datetime as dt df = pd.DataFrame({'Date':['2014-10-20 10:44:31', '2014-10-23 09:33:46', 'nan', '2014-10-01 09:38:45']})

I'm trying to convert these to datetime:

df['Date'] = df['Date'].apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))

but I get the error:

time data 'nan' does not match format '%Y-%m-%d %H:%M:%S'

So I try to turn these into actual nulls:

df.ix[df['Date'] == 'nan', 'Date'] = np.NaN

and repeat:

df['Date'] = df['Date'].apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))

but then I get the error:

must be string, not float

What is the quickest way to solve this problem?

Is null and Notnull in pandas?

Python | Pandas isnull() and notnull() While making a Data Frame from a csv file, many blank columns are imported as null value into the Data Frame which later creates problems while operating that data frame. Pandas isnull() and notnull() methods are used to check and manage NULL values in a data frame.

Does pandas check null?

notnull is a pandas function that will examine one or multiple values to validate that they are not null. In Python, null values are reflected as NaN (not a number) or None to signify no data present. . notnull will return False if either NaN or None is detected. If these values are not present, it will return True.

How do I convert a string to a datetime in Python?

We can convert a string to datetime using strptime() function. This function is available in datetime and time modules to parse a string to datetime and time objects respectively.

Just use to_datetime and set errors='coerce' to handle duff data:

In [321]:  df['Date'] = pd.to_datetime(df['Date'], errors='coerce') df Out[321]:                  Date 0 2014-10-20 10:44:31 1 2014-10-23 09:33:46 2                 NaT 3 2014-10-01 09:38:45  In [322]:  df.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 4 entries, 0 to 3 Data columns (total 1 columns): Date    3 non-null datetime64[ns] dtypes: datetime64[ns](1) memory usage: 64.0 bytes

the problem with calling strptime is that it will raise an error if the string, or dtype is incorrect.

If you did this then it would work:

In [324]:  def func(x):     try:         return dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')     except:         return pd.NaT  df['Date'].apply(func) Out[324]: 0   2014-10-20 10:44:31 1   2014-10-23 09:33:46 2                   NaT 3   2014-10-01 09:38:45 Name: Date, dtype: datetime64[ns]

but it will be faster to use the inbuilt to_datetime rather than call apply which essentially just loops over your series.

timings

In [326]:  %timeit pd.to_datetime(df['Date'], errors='coerce') %timeit df['Date'].apply(func) 10000 loops, best of 3: 65.8 µs per loop 10000 loops, best of 3: 186 µs per loop

We see here that using to_datetime is 3X faster.

How to convert string to datetime with nulls - python, pandas?

Tags:

python

string

datetime

type-conversion

pandas

Colin O'Brien

People also ask

1 Answers

EdChum

Recent Activity

Donate For Us