Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas date offset and conversion

I am trying to interpret a field as a date, change the date to represent the month the date appears in, offset the date by a month and then represent it as a date without a timestamps. I have ended up with this which looks and feels too unwieldy:

    df['DATE'].apply( lambda d: pd.to_datetime(pd.to_datetime(d).to_period('M').to_timestamp('M')\
                                      - np.timedelta64(1,'M')).date())

The timestamps are strings in this format:

    2012-09-01 00:00:00

Any ideas for a better way? Thanks.

like image 827
JAB Avatar asked May 09 '14 00:05

JAB


People also ask

How do I use pandas date offset?

DateOffsets can be created to move dates forward a given number of valid dates. For example, Bday(2) can be added to a date to move it two business days forward. If the date does not start on a valid date, first it is moved to a valid date and then offset is created.

How do I fix date format in pandas?

Call dataframe[column] . dt. strftime(format) where dataframe[column] is the column from the DataFrame containing datetime objects and format is a string representing the new date format. Use "%m" to indicate where the month should be positioned, "%d" for the day, and "%y" for the year.

What is date offset?

Date Offset:Specifies a date offset to calculate a selection date from the current date. The system then selects the invoice requests with an invoice date earlier than or equal to the selection date. Positive date offset values defined dates in the future while negative values refer to dates in the past.

How do I parse a date in pandas?

For non-standard datetime parsing, use pd.to_datetime after pd.read_csv . To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with utc=True . See Parsing a CSV with mixed timezones for more.


1 Answers

Well, you can avoid the apply and do it vectorized (I think that makes it a bit nicer):

print df

                  date  x1
0  2010-01-01 00:00:00  10
1  2010-02-01 00:00:00  10
2  2010-03-01 00:00:00  10
3  2010-04-01 00:00:00  10
4  2010-04-01 00:00:00   5
5  2010-05-01 00:00:00   5

df['date'] = (pd.to_datetime(df['date']).values.astype('datetime64[M]')
              - np.timedelta64(1,'M'))
print df

        date  x1
0 2009-12-01  10
1 2010-01-01  10
2 2010-02-01  10
3 2010-03-01  10
4 2010-03-01   5
5 2010-04-01   5

Of course, the dates will still be datetime64[ns] since pandas always converts to that.

Edit: Suppose you wanted the end of the previous month instead of the beggining of the previous month:

df['date'] = (pd.to_datetime(df['date']).values.astype('datetime64[M]')
              - np.timedelta64(1,'D'))
print df

        date  x1
0 2009-11-30  10
1 2009-12-31  10
2 2010-01-31  10
3 2010-02-28  10
4 2010-02-28   5
5 2010-03-31   5

Edit: Jeff points out that a more pandonic way is to make date a DatetimeIndex and use a Date Offset. So something like:

df['date'] = pd.Index(df['date']).to_datetime() - pd.offsets.MonthBegin(1)
print df

        date  x1
0 2009-12-01  10
1 2010-01-01  10
2 2010-02-01  10
3 2010-03-01  10
4 2010-03-01   5
5 2010-04-01   5

Or month-ends:

df['date'] = pd.Index(df['date']).to_datetime() - pd.offsets.MonthEnd(1)
print df

        date  x1
0 2009-12-31  10
1 2010-01-31  10
2 2010-02-28  10 
3 2010-03-31  10
4 2010-03-31   5
5 2010-04-30   5
like image 147
Karl D. Avatar answered Nov 02 '22 07:11

Karl D.