Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to subset row of condition with some of N rows before the condition meet , more faster than my code?

Since my data set is time series where I have 30 different data frame and each of data frame have more than 10,000 number of rows. I want to examine, the trend before the temperature value goes below 40.

So, I want to subset row when the temperature value is below than 40 and I also want to subset 24 rows before the value become below 40.

I already try some code, the only code that working is below. But it take longer time to subset(like more than 10 minutes for one data frame). So, my code is bad. So I want to know code in python that can subset faster. Can you guys help me?

df=temperature_df.copy()
drop_temperature_df=pd.DataFrame()

# get the index during drop temperature
drop_temperature_index=np.array(df[df[temperature]<40].index)

# subset the data frame for 24 hours before drop temperature
for i,index in enumerate(drop_temperature_index):
    drop_temperature_df=drop_temperature_df.append(df.loc[index-24:index,:])

K['K_{}'.format(string)]=drop_temperature_df.copy() #save the subset data frame

So like data below, I have temperature point below 40 at 1/26/2018 0800 So, I want to subset the point below 40 with 24 rows before (1/25/2018 0800 until 1/26/2018 0800).

enter image description here

like image 669
nrmzmh Avatar asked May 15 '19 00:05

nrmzmh


People also ask

How to select rows based on multiple column conditions in Excel?

Selecting rows based on multiple column conditions using '&' operator. Code #1 : Selecting all the rows from the given dataframe in which ‘Age’ is equal to 21 and ‘Stream’ is present in the options list using basic method.

Can We subset row data from financials data?

We will use s and p 500 companies financials data to demonstrate row data subsetting. Interestingly, this data is available under the PDDL licence. For this article, we have altered the original file to have a limited number of columns only.

How do I select the nth row of a data frame?

The number next to the two # symbols identifies the row uniquely. This number is known as the index. To select an nth row we have to supply the number of the row in bracket notation. Here is the example where we are selecting the 7th row of financials data frame: Square bracket notation is one way of subsetting data from a data frame.

How to select all rows that are not equal to 95?

Code #3 : Selecting all the rows from the given dataframe in which ‘Percentage’ is not equal to 95 using loc []. Selecting those rows whose column value is present in the list using isin () method of the dataframe.


1 Answers

I think you can using the ffill with limit , and find the notnull index , slice the dataframe

yourdf=df[df.temperature.where(df.temperature<40).bfill(limit=24).notnull()].copy()
like image 141
BENY Avatar answered Oct 19 '22 01:10

BENY