I am stuck in the following lines
import quandl,math
import pandas as pd
import numpy as np
from sklearn import preprocessing ,cross_validation , svm
from sklearn.linear_model import LinearRegression
df = quandl.get('WIKI/GOOGL')
df = df[['Adj. Open','Adj. High','Adj. Low','Adj. Close','Adj. Volume']]
df['HL_PCT'] = (df["Adj. High"] - df['Adj. Close'])/df['Adj. Close'] * 100
df['PCT_CHANGE'] = (df["Adj. Close"] - df['Adj. Open'])/df['Adj. Open'] * 100
df = df[['Adj. Close','HL_PCT','PCT_CHANGE','Adj. Open']]
forecast_col = 'Adj. Close'
df.fillna(-99999,inplace = True)
forecast_out = int(math.ceil(.1*len(df)))
df['label'] = df[forecast_col].shift(-forecast_out)
print df.head()
I couldn't understand what is meant by df[forecast_col].shift(-forecast_out)
Please explain the command and what is does??
shift() function Shift index by desired number of periods with an optional time freq. This function takes a scalar parameter called the period, which represents the number of shifts to be made over the desired axis. This function is very helpful when dealing with time-series data.
(1) shift() Pandas Shift() Function, shifts index by the desired number of periods. This function takes a scalar parameter called a period, which represents the number of shifts for the desired axis. This function is beneficial when dealing with time-series data. We can use fill_value to fill beyond boundary values.
shift() If you want to shift your column or subtract the column value with the previous row value from the DataFrame, you can do it by using the shift() function. It consists of a scalar parameter called period, which is responsible for showing the number of shifts to be made over the desired axis.
The shift function can help us understand and quantify how the two distributions differ. The shift function describes how one distribution should be re-arranged to match the other one: it estimates how and by how much one distribution must be shifted.
Shift function of pandas.Dataframe shifts index by desired number of periods with an optional time freq. For further information on shift function please refer this link.
Here is the small example of column values being shifted:
import pandas as pd
import numpy as np
df = pd.DataFrame({"date": ["2000-01-03", "2000-01-03", "2000-03-05", "2000-01-03", "2000-03-05",
"2000-03-05", "2000-07-03", "2000-01-03", "2000-07-03", "2000-07-03"],
"variable": ["A", "A", "A", "B", "B", "B", "C", "C", "C", "D"],
"no": [1, 2.2, 3.5, 1.5, 1.5, 1.2, 1.3, 1.1, 2, 3],
"value": [0.469112, -0.282863, -1.509059, -1.135632, 1.212112, -0.173215,
0.119209, -1.044236, -0.861849, None]})
Below is the column value before it is shifted
df['value']
output
0 0.469112
1 -0.282863
2 -1.509059
3 -1.135632
4 1.212112
5 -0.173215
6 0.119209
7 -1.044236
8 -0.861849
9 NaN
Using shift function values are shifted depending on period given
for example using shift with positive integer shifts rows value downwards:
df['value'].shift(1)
output
0 NaN
1 0.469112
2 -0.282863
3 -1.509059
4 -1.135632
5 1.212112
6 -0.173215
7 0.119209
8 -1.044236
9 -0.861849
Name: value, dtype: float64
using shift with negative integer shifts rows value upwards:
df['value'].shift(-1)
output
0 -0.282863
1 -1.509059
2 -1.135632
3 1.212112
4 -0.173215
5 0.119209
6 -1.044236
7 -0.861849
8 NaN
9 NaN
Name: value, dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With