Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas get data for the end day of month?

The data is given as following:

             return 
2010-01-04  0.016676    
2010-01-05  0.003839
...
2010-01-05  0.003839
2010-01-29  0.001248
2010-02-01  0.000134
...

What I want get is to extract all value that is the last day of month appeared in the data .

2010-01-29  0.00134
2010-02-28  ......

If I directly use pandas.resample, i.e., df.resample('M).last(). I would select the correct rows with the wrong index. (it automatically use the last day of the month as the index)

2010-01-31  0.00134
2010-02-28  ......

How can I get the correct answer in a Pythonic way?

like image 408
MTANG Avatar asked May 18 '18 18:05

MTANG


People also ask

How do I get monthly data in Python?

Method 1: Use DatetimeIndex. month attribute to find the month and use DatetimeIndex.

How would I extract month from a date with a string data type in pandas?

Pandas Extract Month and Year using Datetime.strftime() method takes datetime format and returns a string representing the specific format. You can use %Y and %m as format codes to extract year and month respectively from the pandas DataFrame.

How do you get the month difference in pandas?

Use df. dates1-df. dates2 to find the difference between the two dates and then convert the result in the form of months.


1 Answers

An assumption made here is that your date data is part of the index. If not, I recommend setting it first.

Single Year

I don't think the resampling or grouper functions would do. Let's group on the month number instead and call DataFrameGroupBy.tail.

df.groupby(df.index.month).tail(1) 

Multiple Years

If your data spans multiple years, you'll need to group on the year and month. Using a single grouper created from dt.strftime

df.groupby(df.index.strftime('%Y-%m')).tail(1)

Or, using multiple groupers—

df.groupby([df.index.year, df.index.month]).tail(1)

Note—if your index is not a DatetimeIndex as assumed here, you'll need to replace df.index with pd.to_datetime(df.index, errors='coerce') above.

like image 163
cs95 Avatar answered Sep 18 '22 04:09

cs95