I have a dataset, in which the hour is recorded as <code>[0100:2400]</code>, instead of <code>[0000:2300]</code> For example <pre class="prettyprint"><code>pd.to_datetime('201704102300', format='%Y%m%d%H%M') </code></pre> returns <pre class="prettyprint"><code>Timestamp('2017-04-10 20:00:00') </code></pre> But <pre class="prettyprint"><code>pd.to_datetime('201704102400', format='%Y%m%d%H%M') </code></pre> gives me the error: <blockquote> ValueError: unconverted data remains: 0 </blockquote> How can I fix this problem? I can manually adjust the data, such as mentioned in this SO Post, but I think pandas should have handled this case already? UPDATE: And how to do it in a scalable way for dataframe? For example, the data look like this <img src="https://i.stack.imgur.com/1eM92.png" alt="enter image description here">

Vectorized solution, which uses pd.to_datetime(DataFrame) method: Source DF <pre class="prettyprint"><code>In [27]: df Out[27]: time 0 201704102400 1 201602282400 2 201704102359 </code></pre> Solution <pre class="prettyprint"><code>In [28]: pat = '(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2})(?P<hour>\d{2})(?P<minute>\d{2})' In [29]: pd.to_datetime(df['time'].str.extract(pat, expand=True)) Out[29]: 0 2017-04-11 00:00:00 1 2016-02-29 00:00:00 2 2017-04-10 23:59:00 dtype: datetime64[ns] </code></pre> Explanation: <pre class="prettyprint"><code>In [30]: df['time'].str.extract(pat, expand=True) Out[30]: year month day hour minute 0 2017 04 10 24 00 1 2016 02 28 24 00 2 2017 04 10 23 59 </code></pre> <code>pat</code> is the RegEx pattern argument in the Series.str.extract() function UPDATE: Timing <pre class="prettyprint"><code>In [37]: df = pd.concat([df] * 10**4, ignore_index=True) In [38]: df.shape Out[38]: (30000, 1) In [39]: %timeit df.time.apply(my_to_datetime) 1 loop, best of 3: 4.1 s per loop In [40]: %timeit pd.to_datetime(df['time'].str.extract(pat, expand=True)) 1 loop, best of 3: 475 ms per loop </code></pre>

Pandas: parsing 24:00 instead of 00:00

Tags:

python

datetime

pandas

I have a dataset, in which the hour is recorded as [0100:2400], instead of [0000:2300]

For example

pd.to_datetime('201704102300', format='%Y%m%d%H%M')

returns

Timestamp('2017-04-10 20:00:00')

But

pd.to_datetime('201704102400', format='%Y%m%d%H%M')

gives me the error:

ValueError: unconverted data remains: 0

How can I fix this problem?

I can manually adjust the data, such as mentioned in this SO Post, but I think pandas should have handled this case already?

UPDATE:

And how to do it in a scalable way for dataframe? For example, the data look like this enter image description here

660

asked Apr 12 '17 02:04

cqcn1991

2 Answers

Pandas uses the system strptime, and so if you need something non-standard, you get to roll your own.

Code:

import pandas as pd
import datetime as dt

def my_to_datetime(date_str):
    if date_str[8:10] != '24':
        return pd.to_datetime(date_str, format='%Y%m%d%H%M')

    date_str = date_str[0:8] + '00' + date_str[10:]
    return pd.to_datetime(date_str, format='%Y%m%d%H%M') + \
           dt.timedelta(days=1)

print(my_to_datetime('201704102400'))

Results:

2017-04-11 00:00:00

For a Column in a pandas.DataFrame:

df['time'] = df.time.apply(my_to_datetime)

183

answered Oct 17 '22 22:10

Stephen Rauch

Vectorized solution, which uses pd.to_datetime(DataFrame) method:

Source DF

In [27]: df
Out[27]:
           time
0  201704102400
1  201602282400
2  201704102359

Solution

In [28]: pat = '(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2})(?P<hour>\d{2})(?P<minute>\d{2})'

In [29]: pd.to_datetime(df['time'].str.extract(pat, expand=True))
Out[29]:
0   2017-04-11 00:00:00
1   2016-02-29 00:00:00
2   2017-04-10 23:59:00
dtype: datetime64[ns]

Explanation:

In [30]: df['time'].str.extract(pat, expand=True)
Out[30]:
   year month day hour minute
0  2017    04  10   24     00
1  2016    02  28   24     00
2  2017    04  10   23     59

pat is the RegEx pattern argument in the Series.str.extract() function

UPDATE: Timing

In [37]: df = pd.concat([df] * 10**4, ignore_index=True)

In [38]: df.shape
Out[38]: (30000, 1)

In [39]: %timeit df.time.apply(my_to_datetime)
1 loop, best of 3: 4.1 s per loop

In [40]: %timeit pd.to_datetime(df['time'].str.extract(pat, expand=True))
1 loop, best of 3: 475 ms per loop

answered Oct 17 '22 20:10

MaxU - stop WAR against UA

Related questions
                            
                                Pandas groupby object filtering
                            
                                PyJWT returning invalid token signatures
                            
                                iPython with different env (using anaconda)
                            
                                How to set gunicorn limit_request_line parameter over 8190?
                            
                                Create NumberLong integer using PyMongo
                            
                                How to create a multilevel dataframe in pandas?
                            
                                Python: Copying named tuples with same attributes / fields
                            
                                pymongo update_one(), upsert=True without using $ operators
                            
                                Tensorflow MNIST: terminate called after throwing an instance of 'std::bad_alloc'
                            
                                Django url warning urls.W002
                            
                                Using replace efficiently in pandas
                            
                                Pandas TimeGrouper and Pivot?
                            
                                How to one hot encode variant length features?
                            
                                python getattr() with multiple params
                            
                                send http request using django and get results
                            
                                How to Reverse Rolling Sum?
                            
                                Accessing session object during Unit test of Flask application
                            
                                Convert contour paths to svg paths
                            
                                Create combination of two pandas dataframes in two dimensions
                            
                                Storing a custom Python object in Redis

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With