I am trying to interpret a field as a date, change the date to represent the month the date appears in, offset the date by a month and then represent it as a date without a timestamps. I have ended up with this which looks and feels too unwieldy:
df['DATE'].apply( lambda d: pd.to_datetime(pd.to_datetime(d).to_period('M').to_timestamp('M')\
- np.timedelta64(1,'M')).date())
The timestamps are strings in this format:
2012-09-01 00:00:00
Any ideas for a better way? Thanks.
DateOffsets can be created to move dates forward a given number of valid dates. For example, Bday(2) can be added to a date to move it two business days forward. If the date does not start on a valid date, first it is moved to a valid date and then offset is created.
Call dataframe[column] . dt. strftime(format) where dataframe[column] is the column from the DataFrame containing datetime objects and format is a string representing the new date format. Use "%m" to indicate where the month should be positioned, "%d" for the day, and "%y" for the year.
Date Offset:Specifies a date offset to calculate a selection date from the current date. The system then selects the invoice requests with an invoice date earlier than or equal to the selection date. Positive date offset values defined dates in the future while negative values refer to dates in the past.
For non-standard datetime parsing, use pd.to_datetime after pd.read_csv . To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with utc=True . See Parsing a CSV with mixed timezones for more.
Well, you can avoid the apply and do it vectorized (I think that makes it a bit nicer):
print df
date x1
0 2010-01-01 00:00:00 10
1 2010-02-01 00:00:00 10
2 2010-03-01 00:00:00 10
3 2010-04-01 00:00:00 10
4 2010-04-01 00:00:00 5
5 2010-05-01 00:00:00 5
df['date'] = (pd.to_datetime(df['date']).values.astype('datetime64[M]')
- np.timedelta64(1,'M'))
print df
date x1
0 2009-12-01 10
1 2010-01-01 10
2 2010-02-01 10
3 2010-03-01 10
4 2010-03-01 5
5 2010-04-01 5
Of course, the dates will still be datetime64[ns]
since pandas always converts to that.
Edit: Suppose you wanted the end of the previous month instead of the beggining of the previous month:
df['date'] = (pd.to_datetime(df['date']).values.astype('datetime64[M]')
- np.timedelta64(1,'D'))
print df
date x1
0 2009-11-30 10
1 2009-12-31 10
2 2010-01-31 10
3 2010-02-28 10
4 2010-02-28 5
5 2010-03-31 5
Edit: Jeff points out that a more pandonic way is to make date a DatetimeIndex
and use a Date Offset. So something like:
df['date'] = pd.Index(df['date']).to_datetime() - pd.offsets.MonthBegin(1)
print df
date x1
0 2009-12-01 10
1 2010-01-01 10
2 2010-02-01 10
3 2010-03-01 10
4 2010-03-01 5
5 2010-04-01 5
Or month-ends:
df['date'] = pd.Index(df['date']).to_datetime() - pd.offsets.MonthEnd(1)
print df
date x1
0 2009-12-31 10
1 2010-01-31 10
2 2010-02-28 10
3 2010-03-31 10
4 2010-03-31 5
5 2010-04-30 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With