Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Retrieving last 30 days data from dataframe pandas

I've a dataframe containing six month error logs, collected every day. I want to retrieve the last 30 days records from the last date. Last date isn't today.
For example: I've data for the months May, June, July and until August 15, I want to retrieve that data from August 15 to July 15 making it 30 days records.
Is there a way to do this in Python Pandas?

This is the sample dataframe:

Error_Description         Date        Weekend      Type
N17739 Limit switch X-    5/1/2015    5/3/2015    Critical
N17739 Limit switch Y-    5/1/2015    5/3/2015    Critical
N938 Key non-functional   5/1/2015    5/3/2015    Non-Critical
P124 Magazine is running  5/1/2015    5/3/2015    Non-Critical
N17738 Limit switch Z+    5/1/2015    5/3/2015    Critical
N938 Key non-functional   5/1/2015    5/3/2015    Non-Critical
     ...                    ...         ...          ...
P873 ENCLOSURE DOOR       8/24/2015   8/30/2015   Non-Critical
N3065 Reset M114          8/24/2015   8/30/2015   Non-Critical
N3065 Reset M114,         8/24/2015   8/30/2015   Non-Critical
N2853 Synchronization     8/24/2015   8/30/2015   Critical
P152 ENCLOSURE            8/24/2015   8/30/2015   Non-Critical
N6236 has stopped         8/24/2015   8/30/2015   Critical
like image 380
Naive Babes Avatar asked Nov 23 '15 13:11

Naive Babes


People also ask

How do you get the last value in a DataFrame in Python?

iloc – Pandas Dataframe. iloc is used to retrieve data by specifying its index. In python negative index starts from the end so we can access the last element of the dataframe by specifying its index to -1.

How do I find the last 10 entries in pandas?

Method 1: Using tail() method Use pandas. DataFrame. tail(n) to get the last n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the end).

How do I extract month from pandas?

Use pandas. Extract the month and year from the Datetime column by using DatetimeIndex. month attribute to find the month and use DatetimeIndex.

How do you get the last value in a pandas series?

Pandas iloc is used to retrieve data by specifying its integer index. In python negative index starts from end therefore we can access the last element by specifying index to -1 instead of length-1 which will yield the same result.


2 Answers

Date lastdayfrom is used for selecting last 30 days of DataFrame by function loc.

lastdayfrom = pd.to_datetime('8/24/2015')
print lastdayfrom
#2015-08-24 00:00:00

print df
#           Error_Description       Date    Weekend          Type
#0     N17739 Limit switch X- 2015-05-01 2015-05-03      Critical
#1     N17739 Limit switch Y- 2015-05-01 2015-05-03      Critical
#2    N938 Key non-functional 2015-05-01 2015-05-03  Non-Critical
#3   P124 Magazine is running 2015-05-01 2015-05-03  Non-Critical
#4     N17738 Limit switch Z+ 2015-02-01 2015-05-03      Critical
#5    N938 Key non-functional 2015-07-25 2015-05-03  Non-Critical
#6        P873 ENCLOSURE DOOR 2015-07-24 2015-08-30  Non-Critical
#7           N3065 Reset M114 2015-07-21 2015-08-21  Non-Critical
#8          N3065 Reset M114, 2015-08-22 2015-08-22  Non-Critical
#9      N2853 Synchronization 2015-08-23 2015-08-30      Critical
#10            P152 ENCLOSURE 2015-08-24 2015-08-30  Non-Critical
#11         N6236 has stopped 2015-08-24 2015-08-30      Critical

print df.dtypes
#Error_Description            object
#Date                 datetime64[ns]
#Weekend              datetime64[ns]
#Type                         object
#dtype: object

#set index from column Date
df = df.set_index('Date')
#if datetimeindex isn't order, order it
df= df.sort_index()

#last 30 days of date lastday
df = df.loc[lastdayfrom - pd.Timedelta(days=30):lastdayfrom].reset_index()
print df
#        Date      Error_Description    Weekend          Type
#0 2015-07-25       N3065 Reset M114 2015-08-21  Non-Critical
#1 2015-08-22      N3065 Reset M114, 2015-08-22  Non-Critical
#2 2015-08-23  N2853 Synchronization 2015-08-30      Critical
#3 2015-08-24         P152 ENCLOSURE 2015-08-30  Non-Critical
#4 2015-08-24      N6236 has stopped 2015-08-30      Critical
like image 123
jezrael Avatar answered Sep 21 '22 20:09

jezrael


You can use DataFrame.last_valid_index() to find the label of the last line, and then subtract DateOffset(30, 'D') to go back 30 days:

df[df.last_valid_index()-pandas.DateOffset(30, 'D'):]
like image 42
faltarell Avatar answered Sep 17 '22 20:09

faltarell