I have a dataset of daily data. I need to get only the data of the first day of each month in the data set (The data is from 1972 to 2013). So for example I would need Index 20
, Date 2013-12-02
value of 0.1555
to be extracted.
The problem I have is that the first day for each month is different, so I cannot use a step such as relativedelta(months=1)
, how would I go about of extracting these values from my dataset?
Is there a similar command as I have found in another post for R?
R - XTS: Get the first dates and values for each month from a daily time series with missing rows
17 2013-12-05 0.1621
18 2013-12-04 0.1698
19 2013-12-03 0.1516
20 2013-12-02 0.1555
21 2013-11-29 0.1480
22 2013-11-27 0.1487
23 2013-11-26 0.1648
EOMONTH returns the last day of a month from a date. Here, we use the EOMONTH function to go to the last day of the previous month. Then, we add 1 to get the first day of the current month. To perform the previous example with the EOMONTH function, we need to use the formula =EOMONTH(A2,-1)+1 in cell B2.
You can use df[df['Date']. dt. strftime('%Y-%m')=='2021-11'] method to filter by month.
=EOMONTH(A2, 1) - returns the last day of the month, one month after the date in cell A2. =EOMONTH(A2, -1) - returns the last day of the month, one month before the date in cell A2.
I would groupby the month and then get the zeroth (nth) row of each group.
First set as index (I think this is necessary):
In [11]: df1 = df.set_index('date')
In [12]: df1
Out[12]:
n val
date
2013-12-05 17 0.1621
2013-12-04 18 0.1698
2013-12-03 19 0.1516
2013-12-02 20 0.1555
2013-11-29 21 0.1480
2013-11-27 22 0.1487
2013-11-26 23 0.1648
Next sort, so that the first element is the first date of that month (Note: this doesn't appear to be necessary for nth, but I think that's actually a bug!):
In [13]: df1.sort_index(inplace=True)
In [14]: df1.groupby(pd.TimeGrouper('M')).nth(0)
Out[14]:
n val
date
2013-11-26 23 0.1648
2013-12-02 20 0.1555
another option is to resample and take the first entry:
In [15]: df1.resample('M', 'first')
Out[15]:
n val
date
2013-11-30 23 0.1648
2013-12-31 20 0.1555
Thinking about this, you can do this much simpler by extracting the month and then grouping by that:
In [21]: pd.DatetimeIndex(df.date).to_period('M')
Out[21]:
<class 'pandas.tseries.period.PeriodIndex'>
[2013-12, ..., 2013-11]
Length: 7, Freq: M
In [22]: df.groupby(pd.DatetimeIndex(df.date).to_period('M')).nth(0)
Out[22]:
n date val
0 17 2013-12-05 0.1621
4 21 2013-11-29 0.1480
This time the sortedness of df.date
is (correctly) relevant, if you know it's in descending date order you can use nth(-1)
:
In [23]: df.groupby(pd.DatetimeIndex(df.date).to_period('M')).nth(-1)
Out[23]:
n date val
3 20 2013-12-02 0.1555
6 23 2013-11-26 0.1648
If this isn't guaranteed then sort by the date column first: df.sort('date')
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With