Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extrapolate Pandas DataFrame

It is easy to interpolate values in a Pandas.DataFrame using Series.interpolate, how can extrapolation be done?

For example, given a DataFrame as shown, how can we extrapolate it 14 more months to 2014-12-31? Linear extrapolation is fine.

X1 = range(10)
X2 = map(lambda x: x**2, X1)
df = pd.DataFrame({'x1': X1, 'x2': X2},  index=pd.date_range('20130101',periods=10,freq='M'))

I am thinking that a new DataFrame must first be created, with the DateTimeIndex starting from 2013-11-31 and extending for 14 more M periods. Beyond that I'm stuck.

enter image description here

like image 965
Nyxynyx Avatar asked Dec 08 '15 15:12

Nyxynyx


People also ask

How do you find the extrapolate?

For example, it is certainly known for the calculation of linear exploration using two endpoints (x1, y1) and the (x2, y2) in the linear graph when the value of the point extrapolated is “x,” a formula that one can use is represented as y1+ [(x−x1) / (x2−x1)] *(y2−y1).

What does extrapolate data mean?

Extrapolation refers to estimating an unknown value based on extending a known sequence of values or facts. To extrapolate is to infer something not explicitly stated from existing information.

What is Panda interpolation?

Pandas DataFrame interpolate() Method The interpolate() method replaces the NULL values based on a specified method.


1 Answers

Extrapolating a DataFrame with a DatetimeIndex index

This can be done with two steps:

  1. Extend the DatetimeIndex
  2. Extrapolate the data

Extend the Index

Overwrite df with a new DataFrame where the data is resampled onto a new extended index based on original index's start, period and frequency. This allows the original df to come from anywhere, as in the csv example case. With this the columns get conveniently filled with NaNs!

# Fake DataFrame for example (could come from anywhere)
X1 = range(10)
X2 = map(lambda x: x**2, X1)
df = pd.DataFrame({'x1': X1, 'x2': X2},  index=pd.date_range('20130101',periods=10,freq='M'))

# Number of months to extend
extend = 5

# Extrapolate the index first based on original index
df = pd.DataFrame(
    data=df,
    index=pd.date_range(
        start=df.index[0],
        periods=len(df.index) + extend,
        freq=df.index.freq
    )
)

# Display
print df

    x1  x2
2013-01-31   0   0
2013-02-28   1   1
2013-03-31   2   4
2013-04-30   3   9
2013-05-31   4  16
2013-06-30   5  25
2013-07-31   6  36
2013-08-31   7  49
2013-09-30   8  64
2013-10-31   9  81
2013-11-30 NaN NaN
2013-12-31 NaN NaN
2014-01-31 NaN NaN
2014-02-28 NaN NaN
2014-03-31 NaN NaN

Extrapolate the data

Most extrapolators will require the inputs to be numeric instead of dates. This can be done with

# Temporarily remove dates and make index numeric
di = df.index
df = df.reset_index().drop('index', 1)

See this answer for how to extrapolate the values of each column of a DataFrame with a 3rd order polynomial.

Snippet from answer

# Curve fit each column
for col in fit_df.columns:
    # Get x & y
    x = fit_df.index.astype(float).values
    y = fit_df[col].values
    # Curve fit column and get curve parameters
    params = curve_fit(func, x, y, guess)
    # Store optimized parameters
    col_params[col] = params[0]

# Extrapolate each column
for col in df.columns:
    # Get the index values for NaNs in the column
    x = df[pd.isnull(df[col])].index.astype(float).values
    # Extrapolate those points with the fitted function
    df[col][x] = func(x, *col_params[col])

Once the columns are extrapolated, put the dates back

# Put date index back
df.index = di

# Display
print df

x1   x2
2013-01-31   0    0
2013-02-28   1    1
2013-03-31   2    4
2013-04-30   3    9
2013-05-31   4   16
2013-06-30   5   25
2013-07-31   6   36
2013-08-31   7   49
2013-09-30   8   64
2013-10-31   9   81
2013-11-30  10  100
2013-12-31  11  121
2014-01-31  12  144
2014-02-28  13  169
2014-03-31  14  196
like image 130
tmthydvnprt Avatar answered Sep 29 '22 20:09

tmthydvnprt