Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is meant by shift in dataframe?

I am stuck in the following lines

import quandl,math
import pandas as pd
import numpy as np
from  sklearn import preprocessing ,cross_validation , svm
from sklearn.linear_model import  LinearRegression


df = quandl.get('WIKI/GOOGL')




df = df[['Adj. Open','Adj. High','Adj. Low','Adj. Close','Adj. Volume']]

df['HL_PCT'] = (df["Adj. High"] - df['Adj. Close'])/df['Adj. Close'] * 100
df['PCT_CHANGE'] = (df["Adj. Close"] - df['Adj. Open'])/df['Adj. Open'] * 100

df = df[['Adj. Close','HL_PCT','PCT_CHANGE','Adj. Open']]

forecast_col = 'Adj. Close'

df.fillna(-99999,inplace = True)

forecast_out = int(math.ceil(.1*len(df)))

df['label'] = df[forecast_col].shift(-forecast_out)
print df.head()

I couldn't understand what is meant by df[forecast_col].shift(-forecast_out)

Please explain the command and what is does??

like image 492
rithwik kukunuri Avatar asked Jun 21 '17 12:06

rithwik kukunuri


People also ask

What is DataFrame shift?

shift() function Shift index by desired number of periods with an optional time freq. This function takes a scalar parameter called the period, which represents the number of shifts to be made over the desired axis. This function is very helpful when dealing with time-series data.

What does shift function do in pandas?

(1) shift() Pandas Shift() Function, shifts index by the desired number of periods. This function takes a scalar parameter called a period, which represents the number of shifts for the desired axis. This function is beneficial when dealing with time-series data. We can use fill_value to fill beyond boundary values.

How do you shift values in a data frame?

shift() If you want to shift your column or subtract the column value with the previous row value from the DataFrame, you can do it by using the shift() function. It consists of a scalar parameter called period, which is responsible for showing the number of shifts to be made over the desired axis.

What does the shift function do?

The shift function can help us understand and quantify how the two distributions differ. The shift function describes how one distribution should be re-arranged to match the other one: it estimates how and by how much one distribution must be shifted.


1 Answers

Shift function of pandas.Dataframe shifts index by desired number of periods with an optional time freq. For further information on shift function please refer this link.

Here is the small example of column values being shifted:

import pandas as pd 
import numpy as np
df = pd.DataFrame({"date": ["2000-01-03", "2000-01-03", "2000-03-05", "2000-01-03", "2000-03-05",
                        "2000-03-05", "2000-07-03", "2000-01-03", "2000-07-03", "2000-07-03"],
               "variable": ["A", "A", "A", "B", "B", "B", "C", "C", "C", "D"],
               "no": [1, 2.2, 3.5, 1.5, 1.5, 1.2, 1.3, 1.1, 2, 3],
               "value": [0.469112, -0.282863, -1.509059, -1.135632, 1.212112, -0.173215,
                         0.119209, -1.044236, -0.861849, None]})

Below is the column value before it is shifted

df['value']

output

0    0.469112
1   -0.282863
2   -1.509059
3   -1.135632
4    1.212112
5   -0.173215
6    0.119209
7   -1.044236
8   -0.861849
9         NaN

Using shift function values are shifted depending on period given

for example using shift with positive integer shifts rows value downwards:

df['value'].shift(1)

output

0         NaN
1    0.469112
2   -0.282863
3   -1.509059
4   -1.135632
5    1.212112
6   -0.173215
7    0.119209
8   -1.044236
9   -0.861849
Name: value, dtype: float64

using shift with negative integer shifts rows value upwards:

df['value'].shift(-1)

output

0   -0.282863
1   -1.509059
2   -1.135632
3    1.212112
4   -0.173215
5    0.119209
6   -1.044236
7   -0.861849
8         NaN
9         NaN
Name: value, dtype: float64
like image 167
Akshay Kandul Avatar answered Sep 19 '22 19:09

Akshay Kandul