I have a DataFrame, Df2
. I'm trying to check each of the last 10 rows for the column Lead_Lag
below - if there's any value besides null in any of those rows, then I want a new column Position
to equal 'Y'
:
def run_HG_AUDUSD_15M_Aggregate():
Df1 = pd.read_csv(max(glob.iglob(r"C:\Users\cost9\OneDrive\Documents\PYTHON\Daily Tasks\Pairs Trading\HG_AUDUSD\CSV\15M\Lead_Lag\*.csv"), key=os.path.getctime))
Df2 = Df1[['Date', 'Close_HG', 'Close_AUDUSD', 'Lead_Lag']]
Df2['Position'] = ''
for index,row in Df2.iterrows():
if Df2.loc[Df2.index.shift(-10):index,"Lead_Lag"].isnull():
continue
else:
Df2.loc[index, 'Position'] = "Y"
A sample of the data is as follows:
Date Close_HG Close_AUDUSD Lead_Lag
7/19/2017 12:59 2.7 0.7956
7/19/2017 13:59 2.7 0.7955
7/19/2017 14:14 2.7 0.7954
7/20/2017 3:14 2.7 0.791
7/20/2017 5:44 2.7 0.791
7/20/2017 7:44 2.71 0.7925
7/20/2017 7:59 2.7 0.7924
7/20/2017 8:44 2.7 0.7953 Short_Both
7/20/2017 10:44 2.71 0.7964 Short_Both
7/20/2017 11:14 2.71 0.7963 Short_Both
7/20/2017 11:29 2.71 0.7967 Short_Both
7/20/2017 13:14 2.71 0.796 Short_Both
7/20/2017 13:29 2.71 0.7956 Short_Both
7/20/2017 14:29 2.71 0.7957 Short_Both
So in this case I would want the last two values for the new column Position
to be 'Y'
as there are values in the Lead_Lag
column in at least one of the last 10 rows. I want to apply this on a rolling basis - for instance row 13 'Position' value would look at rows 12-3, row 12 'Position' value would look at rows 11-2, etc.
Instead I get the error:
NotImplementedError: Not supported for type RangeIndex
I've tried several variations of the shift method (defining before the loop etc.) and can't get it to work.
edit: Here's the solution:
N = 10
Df2['Position'] = ''
for index,row in Df2.iterrows():
if (Df2.loc[index-N:index,"Lead_Lag"] != "N").any():
Df2.loc[index, 'Position'] = "Y"
else:
Df2.loc[index, 'Position'] = "N"
Method 1: Using tail() method DataFrame. tail(n) to get the last n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the end). By default n = 5, it return the last 5 rows if the value of n is not passed to the method.
tail() to get the last n rows of a DataFrame. Call pandas. DataFrame. tail(n) with n as an integer to select the last n rows of pandas.
The main distinction between loc and iloc is: loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer position-based, so you have to specify rows and columns by their integer position values (0-based integer position).
EDIT:
After post solution in question I found OP need something else - testing window N
, so added another answer.
Old solution:
Use numpy.where
with boolean mask by chaining:
m = df["Lead_Lag"].notnull() & df.index.isin(df.index[-10:])
Or by select column by position with iloc
and add False
s by reindex
:
m = df["Lead_Lag"].iloc[-10:].notnull().reindex(df.index, fill_value=False)
df['new'] = np.where(m, 'Y', '')
print (df)
Date Close_HG Close_AUDUSD Lead_Lag new
0 7/19/2017 12:59 2.70 0.7956 NaN
1 7/19/2017 13:59 2.70 0.7955 NaN
2 7/19/2017 14:14 2.70 0.7954 NaN
3 7/20/2017 3:14 2.70 0.7910 NaN
4 7/20/2017 5:44 2.70 0.7910 NaN
5 7/20/2017 7:44 2.71 0.7925 NaN
6 7/20/2017 7:59 2.70 0.7924 NaN
7 7/20/2017 8:44 2.70 0.7953 Short_Both Y
8 7/20/2017 10:44 2.71 0.7964 Short_Both Y
9 7/20/2017 11:14 2.71 0.7963 Short_Both Y
10 7/20/2017 11:29 2.71 0.7967 Short_Both Y
11 7/20/2017 13:14 2.71 0.7960 Short_Both Y
12 7/20/2017 13:29 2.71 0.7956 Short_Both Y
13 7/20/2017 14:29 2.71 0.7957 Short_Both Y
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With