Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter data to get only first day of the month rows

I have a dataset of daily data. I need to get only the data of the first day of each month in the data set (The data is from 1972 to 2013). So for example I would need Index 20, Date 2013-12-02 value of 0.1555 to be extracted. The problem I have is that the first day for each month is different, so I cannot use a step such as relativedelta(months=1), how would I go about of extracting these values from my dataset?

Is there a similar command as I have found in another post for R?

R - XTS: Get the first dates and values for each month from a daily time series with missing rows

17 2013-12-05 0.1621
18 2013-12-04 0.1698
19 2013-12-03 0.1516
20 2013-12-02 0.1555
21 2013-11-29 0.1480
22 2013-11-27 0.1487
23 2013-11-26 0.1648
like image 813
tadalendas Avatar asked Sep 11 '14 21:09

tadalendas


People also ask

How do I filter the first day of the month in Excel?

EOMONTH returns the last day of a month from a date. Here, we use the EOMONTH function to go to the last day of the previous month. Then, we add 1 to get the first day of the current month. To perform the previous example with the EOMONTH function, we need to use the formula =EOMONTH(A2,-1)+1 in cell B2.

How do I filter data frame by month?

You can use df[df['Date']. dt. strftime('%Y-%m')=='2021-11'] method to filter by month.

How do I filter month ending date in Excel?

=EOMONTH(A2, 1) - returns the last day of the month, one month after the date in cell A2. =EOMONTH(A2, -1) - returns the last day of the month, one month before the date in cell A2.


1 Answers

I would groupby the month and then get the zeroth (nth) row of each group.

First set as index (I think this is necessary):

In [11]: df1 = df.set_index('date')

In [12]: df1
Out[12]:
             n     val
date
2013-12-05  17  0.1621
2013-12-04  18  0.1698
2013-12-03  19  0.1516
2013-12-02  20  0.1555
2013-11-29  21  0.1480
2013-11-27  22  0.1487
2013-11-26  23  0.1648

Next sort, so that the first element is the first date of that month (Note: this doesn't appear to be necessary for nth, but I think that's actually a bug!):

In [13]: df1.sort_index(inplace=True)

In [14]: df1.groupby(pd.TimeGrouper('M')).nth(0)
Out[14]:
             n     val
date
2013-11-26  23  0.1648
2013-12-02  20  0.1555

another option is to resample and take the first entry:

In [15]: df1.resample('M', 'first')
Out[15]:
             n     val
date
2013-11-30  23  0.1648
2013-12-31  20  0.1555

Thinking about this, you can do this much simpler by extracting the month and then grouping by that:

In [21]: pd.DatetimeIndex(df.date).to_period('M')
Out[21]:
<class 'pandas.tseries.period.PeriodIndex'>
[2013-12, ..., 2013-11]
Length: 7, Freq: M

In [22]: df.groupby(pd.DatetimeIndex(df.date).to_period('M')).nth(0)
Out[22]:
    n       date     val
0  17 2013-12-05  0.1621
4  21 2013-11-29  0.1480

This time the sortedness of df.date is (correctly) relevant, if you know it's in descending date order you can use nth(-1):

In [23]: df.groupby(pd.DatetimeIndex(df.date).to_period('M')).nth(-1)
Out[23]:
    n       date     val
3  20 2013-12-02  0.1555
6  23 2013-11-26  0.1648

If this isn't guaranteed then sort by the date column first: df.sort('date').

like image 165
Andy Hayden Avatar answered Oct 31 '22 00:10

Andy Hayden