I have a series with some datetimes (as strings) and some nulls as 'nan':
import pandas as pd, numpy as np, datetime as dt df = pd.DataFrame({'Date':['2014-10-20 10:44:31', '2014-10-23 09:33:46', 'nan', '2014-10-01 09:38:45']}) I'm trying to convert these to datetime:
df['Date'] = df['Date'].apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')) but I get the error:
time data 'nan' does not match format '%Y-%m-%d %H:%M:%S' So I try to turn these into actual nulls:
df.ix[df['Date'] == 'nan', 'Date'] = np.NaN and repeat:
df['Date'] = df['Date'].apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')) but then I get the error:
must be string, not float
What is the quickest way to solve this problem?
Python | Pandas isnull() and notnull() While making a Data Frame from a csv file, many blank columns are imported as null value into the Data Frame which later creates problems while operating that data frame. Pandas isnull() and notnull() methods are used to check and manage NULL values in a data frame.
notnull is a pandas function that will examine one or multiple values to validate that they are not null. In Python, null values are reflected as NaN (not a number) or None to signify no data present. . notnull will return False if either NaN or None is detected. If these values are not present, it will return True.
We can convert a string to datetime using strptime() function. This function is available in datetime and time modules to parse a string to datetime and time objects respectively.
Just use to_datetime and set errors='coerce' to handle duff data:
In [321]: df['Date'] = pd.to_datetime(df['Date'], errors='coerce') df Out[321]: Date 0 2014-10-20 10:44:31 1 2014-10-23 09:33:46 2 NaT 3 2014-10-01 09:38:45 In [322]: df.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 4 entries, 0 to 3 Data columns (total 1 columns): Date 3 non-null datetime64[ns] dtypes: datetime64[ns](1) memory usage: 64.0 bytes the problem with calling strptime is that it will raise an error if the string, or dtype is incorrect.
If you did this then it would work:
In [324]: def func(x): try: return dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S') except: return pd.NaT df['Date'].apply(func) Out[324]: 0 2014-10-20 10:44:31 1 2014-10-23 09:33:46 2 NaT 3 2014-10-01 09:38:45 Name: Date, dtype: datetime64[ns] but it will be faster to use the inbuilt to_datetime rather than call apply which essentially just loops over your series.
timings
In [326]: %timeit pd.to_datetime(df['Date'], errors='coerce') %timeit df['Date'].apply(func) 10000 loops, best of 3: 65.8 µs per loop 10000 loops, best of 3: 186 µs per loop We see here that using to_datetime is 3X faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With