EDIT:
If you're coming to this question and your string looks like 1996-Q1
, then just use pd.to_datetime(df['Quarter'])
to convert it to a proper pandas datetime. This question is about solving all the quarter dates that are not in this standard format.
ORIGINAL QUESTION:
I'm looking for a nice, readable and understandable way (one that you can remember for the next time) to convert Q3 1996
to a pandas datetime, for example 1996-07-01
in this case.
Until now I found this, but it's mighty ugly:
df = pd.DataFrame({'Quarter':['Q3 1996', 'Q4 1996', 'Q1 1997']})
df['date'] = (
pd.to_datetime(
df['Quarter'].str.split(' ').apply(lambda x: ''.join(x[::-1]))
))
print(df)
Quarter date
0 Q3 1996 1996-07-01
1 Q4 1996 1996-10-01
2 Q1 1997 1997-01-01
I was hoping the following would work, because it's readable, but unfortunately it doesn't:
df['date'] = pd.to_datetime(df['Quarter'], format='%q %Y')
The problem is also that quarter and year are apparently in the wrong order for pandas to do simple processing.
Can anyone help me find a cleaner way of converting Q3 1996
to a pandas datetime?
Call dataframe[column] . dt. strftime(format) where dataframe[column] is the column from the DataFrame containing datetime objects and format is a string representing the new date format. Use "%m" to indicate where the month should be positioned, "%d" for the day, and "%y" for the year.
strftime(*args, **kwargs)[source] Convert to Index using specified date_format. Return an Index of formatted strings specified by date_format, which supports the same string format as the python standard library. Details of the string format can be found in python string format doc.
You can (and should) use pd.PeriodIndex
as a first step, then convert to timestamp using PeriodIndex.to_timestamp
:
qs = df['Quarter'].str.replace(r'(Q\d) (\d+)', r'\2-\1')
qs
0 1996-Q3
1 1996-Q4
2 1997-Q1
Name: Quarter, dtype: object
df['date'] = pd.PeriodIndex(qs, freq='Q').to_timestamp()
df
Quarter date
0 Q3 1996 1996-07-01
1 Q4 1996 1996-10-01
2 Q1 1997 1997-01-01
The initial replace step is necessary as PeriodIndex
expects your periods in the %Y-%q
format.
Another option is to use pd.to_datetime
after performing string replacement in the same way as before.
df['date'] = pd.to_datetime(
df['Quarter'].str.replace(r'(Q\d) (\d+)', r'\2-\1'), errors='coerce')
df
Quarter date
0 Q3 1996 1996-07-01
1 Q4 1996 1996-10-01
2 Q1 1997 1997-01-01
If performance is important, you can split and join, but you can do it cleanly:
df['date'] = pd.to_datetime([
'-'.join(x.split()[::-1]) for x in df['Quarter']])
df
Quarter date
0 Q3 1996 1996-07-01
1 Q4 1996 1996-10-01
2 Q1 1997 1997-01-01
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With