Pandas Check last N Rows for values, new column based on results

I have a DataFrame, Df2. I'm trying to check each of the last 10 rows for the column Lead_Lag below - if there's any value besides null in any of those rows, then I want a new column Position to equal 'Y':

def run_HG_AUDUSD_15M_Aggregate():
    Df1 = pd.read_csv(max(glob.iglob(r"C:\Users\cost9\OneDrive\Documents\PYTHON\Daily Tasks\Pairs Trading\HG_AUDUSD\CSV\15M\Lead_Lag\*.csv"), key=os.path.getctime))    
    Df2 = Df1[['Date', 'Close_HG', 'Close_AUDUSD', 'Lead_Lag']]

    Df2['Position'] = ''

    for index,row in Df2.iterrows():
        if Df2.loc[Df2.index.shift(-10):index,"Lead_Lag"].isnull(): 
            continue
        else:
            Df2.loc[index, 'Position'] = "Y"

A sample of the data is as follows:

Date	Close_HG	Close_AUDUSD	Lead_Lag
7/19/2017 12:59	2.7	0.7956	
7/19/2017 13:59	2.7	0.7955	
7/19/2017 14:14	2.7	0.7954	
7/20/2017 3:14	2.7	0.791	
7/20/2017 5:44	2.7	0.791	
7/20/2017 7:44	2.71	0.7925	
7/20/2017 7:59	2.7	0.7924	
7/20/2017 8:44	2.7	0.7953	Short_Both
7/20/2017 10:44	2.71	0.7964	Short_Both
7/20/2017 11:14	2.71	0.7963	Short_Both
7/20/2017 11:29	2.71	0.7967	Short_Both
7/20/2017 13:14	2.71	0.796	Short_Both
7/20/2017 13:29	2.71	0.7956	Short_Both
7/20/2017 14:29	2.71	0.7957	Short_Both

So in this case I would want the last two values for the new column Position to be 'Y' as there are values in the Lead_Lag column in at least one of the last 10 rows. I want to apply this on a rolling basis - for instance row 13 'Position' value would look at rows 12-3, row 12 'Position' value would look at rows 11-2, etc.

Instead I get the error:

NotImplementedError: Not supported for type RangeIndex

I've tried several variations of the shift method (defining before the loop etc.) and can't get it to work.

edit: Here's the solution:

N = 10
Df2['Position'] = ''
for index,row in Df2.iterrows():
    if (Df2.loc[index-N:index,"Lead_Lag"] != "N").any():
        Df2.loc[index, 'Position'] = "Y"
    else:
        Df2.loc[index, 'Position'] = "N"

How do I get the last 5 rows of a data frame?

Method 1: Using tail() method DataFrame. tail(n) to get the last n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the end). By default n = 5, it return the last 5 rows if the value of n is not passed to the method.

How do you get the last 5 rows in pandas?

tail() to get the last n rows of a DataFrame. Call pandas. DataFrame. tail(n) with n as an integer to select the last n rows of pandas.

What is the difference between LOC () and ILOC () Explain with example?

The main distinction between loc and iloc is: loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer position-based, so you have to specify rows and columns by their integer position values (0-based integer position).

EDIT:

After post solution in question I found OP need something else - testing window N, so added another answer.

Old solution:

Use numpy.where with boolean mask by chaining:

m = df["Lead_Lag"].notnull() & df.index.isin(df.index[-10:])

Or by select column by position with iloc and add Falses by reindex:

m = df["Lead_Lag"].iloc[-10:].notnull().reindex(df.index, fill_value=False)

df['new'] = np.where(m, 'Y', '')

print (df)
               Date  Close_HG  Close_AUDUSD    Lead_Lag new
0   7/19/2017 12:59      2.70        0.7956         NaN    
1   7/19/2017 13:59      2.70        0.7955         NaN    
2   7/19/2017 14:14      2.70        0.7954         NaN    
3    7/20/2017 3:14      2.70        0.7910         NaN    
4    7/20/2017 5:44      2.70        0.7910         NaN    
5    7/20/2017 7:44      2.71        0.7925         NaN    
6    7/20/2017 7:59      2.70        0.7924         NaN    
7    7/20/2017 8:44      2.70        0.7953  Short_Both   Y
8   7/20/2017 10:44      2.71        0.7964  Short_Both   Y
9   7/20/2017 11:14      2.71        0.7963  Short_Both   Y
10  7/20/2017 11:29      2.71        0.7967  Short_Both   Y
11  7/20/2017 13:14      2.71        0.7960  Short_Both   Y
12  7/20/2017 13:29      2.71        0.7956  Short_Both   Y
13  7/20/2017 14:29      2.71        0.7957  Short_Both   Y

Pandas Check last N Rows for values, new column based on results

Tags:

Cole Starbuck

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us

Pandas Check last N Rows for values, new column based on results

Tags:

Cole Starbuck

People also ask

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us