I am trying to interpret a field as a date, change the date to represent the month the date appears in, offset the date by a month and then represent it as a date without a timestamps. I have ended up with this which looks and feels too unwieldy: <pre class="prettyprint"><code> df['DATE'].apply( lambda d: pd.to_datetime(pd.to_datetime(d).to_period('M').to_timestamp('M')\ - np.timedelta64(1,'M')).date()) </code></pre> The timestamps are strings in this format: <pre class="prettyprint"><code> 2012-09-01 00:00:00 </code></pre> Any ideas for a better way? Thanks.

Well, you can avoid the apply and do it vectorized (I think that makes it a bit nicer): <pre class="prettyprint"><code>print df date x1 0 2010-01-01 00:00:00 10 1 2010-02-01 00:00:00 10 2 2010-03-01 00:00:00 10 3 2010-04-01 00:00:00 10 4 2010-04-01 00:00:00 5 5 2010-05-01 00:00:00 5 df['date'] = (pd.to_datetime(df['date']).values.astype('datetime64[M]') - np.timedelta64(1,'M')) print df date x1 0 2009-12-01 10 1 2010-01-01 10 2 2010-02-01 10 3 2010-03-01 10 4 2010-03-01 5 5 2010-04-01 5 </code></pre> Of course, the dates will still be <code>datetime64[ns]</code> since pandas always converts to that. Edit: Suppose you wanted the end of the previous month instead of the beggining of the previous month: <pre class="prettyprint"><code>df['date'] = (pd.to_datetime(df['date']).values.astype('datetime64[M]') - np.timedelta64(1,'D')) print df date x1 0 2009-11-30 10 1 2009-12-31 10 2 2010-01-31 10 3 2010-02-28 10 4 2010-02-28 5 5 2010-03-31 5 </code></pre> Edit: Jeff points out that a more pandonic way is to make date a <code>DatetimeIndex</code> and use a Date Offset. So something like: <pre class="prettyprint"><code>df['date'] = pd.Index(df['date']).to_datetime() - pd.offsets.MonthBegin(1) print df date x1 0 2009-12-01 10 1 2010-01-01 10 2 2010-02-01 10 3 2010-03-01 10 4 2010-03-01 5 5 2010-04-01 5 </code></pre> Or month-ends: <pre class="prettyprint"><code>df['date'] = pd.Index(df['date']).to_datetime() - pd.offsets.MonthEnd(1) print df date x1 0 2009-12-31 10 1 2010-01-31 10 2 2010-02-28 10 3 2010-03-31 10 4 2010-03-31 5 5 2010-04-30 5 </code></pre>

Pandas date offset and conversion

Tags:

python

datetime

pandas

I am trying to interpret a field as a date, change the date to represent the month the date appears in, offset the date by a month and then represent it as a date without a timestamps. I have ended up with this which looks and feels too unwieldy:

    df['DATE'].apply( lambda d: pd.to_datetime(pd.to_datetime(d).to_period('M').to_timestamp('M')\
                                      - np.timedelta64(1,'M')).date())

The timestamps are strings in this format:

    2012-09-01 00:00:00

Any ideas for a better way? Thanks.

827

asked May 09 '14 00:05

JAB

1 Answers

Well, you can avoid the apply and do it vectorized (I think that makes it a bit nicer):

print df

                  date  x1
0  2010-01-01 00:00:00  10
1  2010-02-01 00:00:00  10
2  2010-03-01 00:00:00  10
3  2010-04-01 00:00:00  10
4  2010-04-01 00:00:00   5
5  2010-05-01 00:00:00   5

df['date'] = (pd.to_datetime(df['date']).values.astype('datetime64[M]')
              - np.timedelta64(1,'M'))
print df

        date  x1
0 2009-12-01  10
1 2010-01-01  10
2 2010-02-01  10
3 2010-03-01  10
4 2010-03-01   5
5 2010-04-01   5

Of course, the dates will still be datetime64[ns] since pandas always converts to that.

Edit: Suppose you wanted the end of the previous month instead of the beggining of the previous month:

df['date'] = (pd.to_datetime(df['date']).values.astype('datetime64[M]')
              - np.timedelta64(1,'D'))
print df

        date  x1
0 2009-11-30  10
1 2009-12-31  10
2 2010-01-31  10
3 2010-02-28  10
4 2010-02-28   5
5 2010-03-31   5

Edit: Jeff points out that a more pandonic way is to make date a DatetimeIndex and use a Date Offset. So something like:

df['date'] = pd.Index(df['date']).to_datetime() - pd.offsets.MonthBegin(1)
print df

        date  x1
0 2009-12-01  10
1 2010-01-01  10
2 2010-02-01  10
3 2010-03-01  10
4 2010-03-01   5
5 2010-04-01   5

Or month-ends:

df['date'] = pd.Index(df['date']).to_datetime() - pd.offsets.MonthEnd(1)
print df

        date  x1
0 2009-12-31  10
1 2010-01-31  10
2 2010-02-28  10 
3 2010-03-31  10
4 2010-03-31   5
5 2010-04-30   5

147

answered Nov 02 '22 07:11

Karl D.

Related questions
                            
                                Python HTTP server not responding on POST request
                            
                                How can I draw a CART tree in Python, as I can in R?
                            
                                Pandas dataframe transpose, to_csv
                            
                                Can I run a bash script in Python and keep any env variables it exports?
                            
                                Why is it a syntax error to invoke a method on a numeric literal in Python?
                            
                                How to substract multidimensional array in Python?
                            
                                Make subset of array, based on values of two other arrays in Python
                            
                                Python numpy or pandas equivalent of the R function sweep()
                            
                                When does CPython garbage collect?
                            
                                setup.py packages and unicode_literals
                            
                                Are True, False, None keywords or built-ins in Python 3?
                            
                                Pandas GroupBy Mean of Large DataSet in CSV
                            
                                why does importing a module executes all statements in python?
                            
                                How to speed up python networking?
                            
                                Python pip installation error
                            
                                How to dynamically update a matplotlib table cell text
                            
                                change a charts border area color
                            
                                How to use dblquad for double integration?
                            
                                pyqtSlot() and return type python list
                            
                                error when unload a 64bit dll using ctypes windll

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With