I've a dataframe containing six month error logs, collected every day. I want to retrieve the last 30 days records from the last date. Last date isn't today.
For example: I've data for the months May, June, July and until August 15
, I want to retrieve that data from August 15
to July 15
making it 30 days records.
Is there a way to do this in Python Pandas?
This is the sample dataframe:
Error_Description Date Weekend Type
N17739 Limit switch X- 5/1/2015 5/3/2015 Critical
N17739 Limit switch Y- 5/1/2015 5/3/2015 Critical
N938 Key non-functional 5/1/2015 5/3/2015 Non-Critical
P124 Magazine is running 5/1/2015 5/3/2015 Non-Critical
N17738 Limit switch Z+ 5/1/2015 5/3/2015 Critical
N938 Key non-functional 5/1/2015 5/3/2015 Non-Critical
... ... ... ...
P873 ENCLOSURE DOOR 8/24/2015 8/30/2015 Non-Critical
N3065 Reset M114 8/24/2015 8/30/2015 Non-Critical
N3065 Reset M114, 8/24/2015 8/30/2015 Non-Critical
N2853 Synchronization 8/24/2015 8/30/2015 Critical
P152 ENCLOSURE 8/24/2015 8/30/2015 Non-Critical
N6236 has stopped 8/24/2015 8/30/2015 Critical
iloc – Pandas Dataframe. iloc is used to retrieve data by specifying its index. In python negative index starts from the end so we can access the last element of the dataframe by specifying its index to -1.
Method 1: Using tail() method Use pandas. DataFrame. tail(n) to get the last n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the end).
Use pandas. Extract the month and year from the Datetime column by using DatetimeIndex. month attribute to find the month and use DatetimeIndex.
Pandas iloc is used to retrieve data by specifying its integer index. In python negative index starts from end therefore we can access the last element by specifying index to -1 instead of length-1 which will yield the same result.
Date lastdayfrom
is used for selecting last 30 days of DataFrame
by function loc.
lastdayfrom = pd.to_datetime('8/24/2015')
print lastdayfrom
#2015-08-24 00:00:00
print df
# Error_Description Date Weekend Type
#0 N17739 Limit switch X- 2015-05-01 2015-05-03 Critical
#1 N17739 Limit switch Y- 2015-05-01 2015-05-03 Critical
#2 N938 Key non-functional 2015-05-01 2015-05-03 Non-Critical
#3 P124 Magazine is running 2015-05-01 2015-05-03 Non-Critical
#4 N17738 Limit switch Z+ 2015-02-01 2015-05-03 Critical
#5 N938 Key non-functional 2015-07-25 2015-05-03 Non-Critical
#6 P873 ENCLOSURE DOOR 2015-07-24 2015-08-30 Non-Critical
#7 N3065 Reset M114 2015-07-21 2015-08-21 Non-Critical
#8 N3065 Reset M114, 2015-08-22 2015-08-22 Non-Critical
#9 N2853 Synchronization 2015-08-23 2015-08-30 Critical
#10 P152 ENCLOSURE 2015-08-24 2015-08-30 Non-Critical
#11 N6236 has stopped 2015-08-24 2015-08-30 Critical
print df.dtypes
#Error_Description object
#Date datetime64[ns]
#Weekend datetime64[ns]
#Type object
#dtype: object
#set index from column Date
df = df.set_index('Date')
#if datetimeindex isn't order, order it
df= df.sort_index()
#last 30 days of date lastday
df = df.loc[lastdayfrom - pd.Timedelta(days=30):lastdayfrom].reset_index()
print df
# Date Error_Description Weekend Type
#0 2015-07-25 N3065 Reset M114 2015-08-21 Non-Critical
#1 2015-08-22 N3065 Reset M114, 2015-08-22 Non-Critical
#2 2015-08-23 N2853 Synchronization 2015-08-30 Critical
#3 2015-08-24 P152 ENCLOSURE 2015-08-30 Non-Critical
#4 2015-08-24 N6236 has stopped 2015-08-30 Critical
You can use DataFrame.last_valid_index()
to find the label of the last line, and then subtract DateOffset(30, 'D')
to go back 30 days:
df[df.last_valid_index()-pandas.DateOffset(30, 'D'):]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With