Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dataframe filter rows by hour and max date

within a dataframe I want to filter all rows from some "Series" by hour / key and equal to the max date.

I have a working piece of code but wondered if there is not something more compact or elegant to achieve the same ?

df = pd.read_csv("./example.csv")
df["Date"] = pd.to_datetime(df["Date"])
Keys2Filter = (1, 2)
df

...
    Key Date    Value
0   1   2019-04-17 00:00:00 1
1   1   2019-04-17 01:00:00 2
2   1   2019-04-17 02:00:00 3
3   1   2019-04-17 00:00:00 4
4   1   2019-04-17 01:00:00 5
5   1   2019-04-17 00:00:00 7
6   2   2019-04-17 01:00:00 8
7   2   2019-04-17 02:00:00 9
8   2   2019-04-17 00:00:00 9
9   2   2019-04-17 01:00:00 9
10  3   2019-04-17 01:00:00 9
11  3   2019-04-17 01:00:00 9
...

if (len(Keys2Filter)):
    dfMax = df[df.Key.isin(Keys2Filter)]  
    dfMax = dfMax.groupby(by=["Key", dfMax.Date.dt.hour]).max()
    dfMax.index.names = ["Key", "Hour"]
    dfMax = dfMax.reset_index()
    df = df[~df.Key.isin(Keys2Filter)].append(dfMax.drop(columns=['Hour'])) 

df   

Key Date    Value
10  3   2019-04-17 01:00:00 9
11  3   2019-04-17 01:00:00 9
0   1   2019-04-17 00:00:00 7
1   1   2019-04-17 01:00:00 5
2   1   2019-04-17 02:00:00 3
3   2   2019-04-17 00:00:00 9
4   2   2019-04-17 01:00:00 9
5   2   2019-04-17 02:00:00 9
like image 415
pofpof Avatar asked Apr 19 '19 12:04

pofpof


People also ask

How do I filter a DataFrame on a date range?

To filter rows based on dates, first format the dates in the DataFrame to datetime64 type. Then use the DataFrame. loc[] and DataFrame. query[] function from the Pandas package to specify a filter condition.

How do I filter rows in a DataFrame?

Filter Rows by Condition You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows. You can also write the above statement with a variable.

How do I filter between two dates in Python?

In order to select rows between two dates in pandas DataFrame, first, create a boolean mask using mask = (df['InsertedDates'] > start_date) & (df['InsertedDates'] <= end_date) to represent the start and end of the date range. Then you select the DataFrame that lies within the range using the DataFrame. loc[] method.


2 Answers

Solution is good, only small simplifying - boolean mask should be chached to variable m and for inverting used ~, also double DataFrame.reset_index is used - first for remove second level and second for column from index:

m = df.Key.isin(Keys2Filter)
df = (df[~m].append(df[m].groupby(by=["Key", df.Date.dt.hour])
                         .max()
                         .reset_index(level=1, drop=True)
                         .reset_index()))

print (df)
    Key                Date  Value
10    3 2019-04-17 01:00:00      9
11    3 2019-04-17 01:00:00      9
0     1 2019-04-17 00:00:00      7
1     1 2019-04-17 01:00:00      5
2     1 2019-04-17 02:00:00      3
3     2 2019-04-17 00:00:00      9
4     2 2019-04-17 01:00:00      9
5     2 2019-04-17 02:00:00      9
like image 126
jezrael Avatar answered Sep 27 '22 21:09

jezrael


IIUC using transform and isin, transform will get max for all group tide with row , then we using isin , if the value is not in Keys2Filter, then we should select , if it is in Keys2Filter which will return False under ~isin then we go check the 2rd condition , if that row yield the max value of group , then we still select it .

s=df.groupby([df.Key,df.Date.dt.strftime('%Y-%m-%d %H')])['Value'].transform('idxmax')
Keys2Filter = (1, 2)
df=df.loc[~df.Key.isin(Keys2Filter)|df.index.eq(s)].copy()#avoid the copy error 
df
Out[991]: 
    Key                Date  Value
2     1 2019-04-17 02:00:00      3
4     1 2019-04-17 01:00:00      5
5     1 2019-04-17 00:00:00      7
7     2 2019-04-17 02:00:00      9
8     2 2019-04-17 00:00:00      9
9     2 2019-04-17 01:00:00      9
10    3 2019-04-17 01:00:00      9
11    3 2019-04-17 01:00:00      9

Notice , Using this method will keep the original order of your df.

like image 34
BENY Avatar answered Sep 27 '22 23:09

BENY