Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need to apply different formulas based on the row number in the dataframe

Tags:

I am working on finding some sort of moving average in a dataframe. The formula will change based on the number of the row it is being computed for. The actual scenario is where I need to compute column Z.

Edit-2:

Below is the actual data I am working with

          Date     Open     High      Low    Close
0   01-01-2018  1763.95  1763.95  1725.00  1731.35
1   02-01-2018  1736.20  1745.80  1725.00  1743.20
2   03-01-2018  1741.10  1780.00  1740.10  1774.60
3   04-01-2018  1779.95  1808.00  1770.00  1801.35
4   05-01-2018  1801.10  1820.40  1795.60  1809.95
5   08-01-2018  1816.00  1827.95  1800.00  1825.00
6   09-01-2018  1823.00  1835.00  1793.90  1812.05
7   10-01-2018  1812.05  1823.00  1801.40  1816.55
8   11-01-2018  1825.00  1825.05  1798.55  1802.10
9   12-01-2018  1805.00  1820.00  1794.00  1804.95
10  15-01-2018  1809.90  1834.45  1792.45  1830.00
11  16-01-2018  1835.00  1857.45  1826.10  1850.25
12  17-01-2018  1850.00  1852.45  1826.20  1840.50
13  18-01-2018  1840.50  1852.00  1823.50  1839.00
14  19-01-2018  1828.25  1836.35  1811.00  1829.50
15  22-01-2018  1816.50  1832.55  1805.50  1827.20
16  23-01-2018  1825.00  1825.00  1782.25  1790.15
17  24-01-2018  1787.80  1792.70  1732.15  1737.50
18  25-01-2018  1739.90  1753.40  1720.00  1726.40
19  29-01-2018  1735.15  1754.95  1729.80  1738.70

The code snippet I am using is as below:

from datetime import date
from nsepy import get_history
import csv
import pandas as pd
import numpy as np
import requests
from datetime import timedelta
import datetime as dt
import pandas_datareader.data as web
import io

df = pd.read_csv('ACC.CSV')

idx = df.reset_index().index

df['Change'] = df['Close'].diff()
df['Advance'] = np.where(df.Change > 0, df.Change,0)
df['Decline'] = np.where(df.Change < 0, df.Change*-1, 0)
conditions = [idx < 14, idx == 14, idx > 14]
values = [0, (df.Advance.rolling(14).sum())/14, (df.Avg_Gain.shift(1) * 13 + df.Advance)/14]
df['Avg_Gain'] = np.select(conditions, values)
df['Avg_Loss'] = (df.Decline.rolling(14).sum())/14
df['RS'] = df.Avg_Gain / df.Avg_Loss
df['RSI'] = np.where(df['Avg_Loss'] == 0, 100, 100-(100/(1+df.RS)))
df.drop(['Change', 'Advance', 'Decline', 'Avg_Gain', 'Avg_Loss', 'RS'],     axis=1)

print(df.head(20))

Below is the error I am getting:

Traceback (most recent call last):
  File "C:/Users/Lenovo/Desktop/Python/0.Chart Patterns/Z.Sample Code.py", line 20, in <module>
    values = [0, (df.Advance.rolling(14).sum())/14, (df.Avg_Gain.shift(1) * 13 + df.Advance)/14]
  File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\generic.py", line 3614, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'Avg_Gain'

Edit 3: Below is the expected output and then I will also write down the formula. Original DF consists of columns Date Open High Low & Close.

enter image description here

Advance and Decline – 
•   If the difference between current & previous is +ve then Advance = difference and Decline = 0
•   If the difference between current & previous is –ve then Advance = 0 and     Decline = -1 * difference
Avg_Gain:
•   If index is < 13 then Avg_Gain = 0
•   If index = 13 then Avg_Gain = Average of 14 periods
•   If index > 13, then Avg_Gain = (Avg_Gain(previous-row) * 13 +     Advance(current-row) )/14
Avg_Loss:
•   If index is < 13 then Avg_Loss = 0
•   If index = 13 then Avg_Loss = Average of Advance of 14 periods
•   If index > 13, then Avg_Loss = (Avg_Loss(previous-row) * 13 + Decline(current-row) )/14
RS:
•   If index < 13 then RS = 0
•   If Index >= 13 then RS = Avg_Gain/Avg_Loss
RSI = 100-(100/(1 + RS))

I hope this helps.

like image 238
Chirayu05 Avatar asked Jul 16 '18 06:07

Chirayu05


2 Answers

You have an error in your code because you use df.Avg_Gain in creating df.Avg_Gain. Y

values = [0, (df.Advance.rolling(14).sum())/14, (df.Avg_Gain.shift(1) * 13 + df.Advance)/14]
df['Avg_Gain'] = np.select(conditions, values)

I changed that part of the code to the following:

up = df.Advance.rolling(14).sum()/14
values = [0, up, (up.shift(1) * 13 + df.Advance)/14]

Output (idx>=14):

    Date        Open    High    Low     Close   RSI
14  2018-01-19  1828.25 1836.35 1811.00 1829.50 75.237850
15  2018-01-22  1816.50 1832.55 1805.50 1827.20 72.920021
16  2018-01-23  1825.00 1825.00 1782.25 1790.15 58.793750
17  2018-01-24  1787.80 1792.70 1732.15 1737.50 40.573938
18  2018-01-25  1739.90 1753.40 1720.00 1726.40 31.900045
19  2018-01-29  1735.15 1754.95 1729.80 1738.70 33.197678

There should be a better way of doing this though. I'll update it with a better solution if i find one. Let me know if this data is correct.

UPDATE: You also need to correct your calculation for 'Avg_loss`::

down = df.Decline.rolling(14).sum()/14
down_values = [0, down, (down.shift(1) * 13 + df.Decline)/14]
df['Avg_Loss'] = np.select(conditions, down_values)

http://stockcharts.com/school/doku.php?id=chart_school:technical_indicators:relative_strength_index_rsi#calculation

UPDATE 2: After Expected data was provided. So the only way i could do this is by looping - not sure if possible to do it otherwise - maybe some pandas functionality i'm unaware of.

So first do the same as before with setting Avg_Gain and Avg_Loss: you just need to change the values slightly:

conditions = [idx<13, idx==13, idx>13]
up = df.Advance.rolling(14).sum()/14
values = [0, up, 0]
df['Avg_Gain'] = np.select(conditions, values)

down = df.Decline.rolling(14).sum()/14
down_values = [0, down, 0]
df['Avg_Loss'] = np.select(conditions, d_values)

I have changed your conditions to split on index 13 - since this is what i see based on the expected output.

Once you run this code you will populate the values for Avg_Gain and Avg_Loss from index 14 using the previous value for Agv_Gain and Avg_Loss.

p=14
for i in range(p, len(df)):
    df.at[i, 'Avg_Gain'] = ((df.loc[i-1, 'Avg_Gain'] * (p-1)) + df.loc[i, 'Advance']) / p
    df.at[i, 'Avg_Loss'] = ((df.loc[i-1, 'Avg_Loss'] * (p-1)) + df.loc[i, 'Decline']) / p

Output:

df[13:][['Date','Avg_Gain', 'Avg_Loss', 'RS', 'RSI']]

    Date        Avg_Gain    Avg_Loss    RS          RSI
13  2018-01-18  10.450000   2.760714    3.785252    79.102460
14  2018-01-19  9.703571    3.242092    2.992997    74.956155
15  2018-01-22  9.010459    3.174800    2.838119    73.945571
16  2018-01-23  8.366855    5.594457    1.495562    59.928860
17  2018-01-24  7.769222    8.955567    0.867530    46.453335
18  2018-01-25  7.214278    9.108741    0.792017    44.196960
19  2018-01-29  7.577544    8.458116    0.895890    47.254330
like image 98
gyx-hh Avatar answered Sep 28 '22 19:09

gyx-hh


You can do it by using ewm and also go have a look at there for a bit more explanation. You will find that ewm can be used to calculate the y(row)=α*x(row)+(1−α)*y(row−1), where for example y is the column Avg_Gain, x is the value of the column Advance and α is the weight to give to x(row)

# define the number for the window
win_n = 14
# create a dataframe df_avg with the non null value of the two columns 
# Advance and Decline such as the first is the mean of the 14 first values and the rest as normal
df_avg = (pd.DataFrame({'Avg_Gain': np.append(df.Advance[:win_n].mean(), df.Advance[win_n:]), 
                        'Avg_Loss': np.append(df.Decline[:win_n].mean(), df.Decline[win_n:])},   
                        df.index[win_n-1:])
               .ewm(adjust=False, alpha=1./win_n).mean()) # what you need to calculate with your formula

# create the two other columns RS and RSI
df_avg['RS'] = df_avg.Avg_Gain / df_avg.Avg_Loss
df_avg['RSI'] = 100.-(100./(1. + df_avg['RS']))

and df_avg looks like:

     Avg_Gain  Avg_Loss        RS        RSI
13  10.450000  2.760714  3.785252  79.102460
14   9.703571  3.242092  2.992997  74.956155
15   9.010459  3.174800  2.838119  73.945571
16   8.366855  5.594457  1.495562  59.928860
17   7.769222  8.955567  0.867530  46.453335
18   7.214278  9.108741  0.792017  44.196960
19   7.577544  8.458116  0.895890  47.254330

You can join it to the original data and fillna with 0:

df = df.join(df_avg).fillna(0)
like image 44
Ben.T Avatar answered Sep 28 '22 17:09

Ben.T