Add trend line to pandas

Q: Can pandas generate graphics plots?

Pandas uses the plot() method to create diagrams. We can use Pyplot, a submodule of the Matplotlib library to visualize the diagram on the screen.

Tags:

statsmodels

I have time-series data, as followed:

Click to copy

                  emplvl
date                    
2003-01-01  10955.000000
2003-04-01  11090.333333
2003-07-01  11157.000000
2003-10-01  11335.666667
2004-01-01  11045.000000
2004-04-01  11175.666667
2004-07-01  11135.666667
2004-10-01  11480.333333
2005-01-01  11441.000000
2005-04-01  11531.000000
2005-07-01  11320.000000
2005-10-01  11516.666667
2006-01-01  11291.000000
2006-04-01  11223.000000
2006-07-01  11230.000000
2006-10-01  11293.000000
2007-01-01  11126.666667
2007-04-01  11383.666667
2007-07-01  11535.666667
2007-10-01  11567.333333
2008-01-01  11226.666667
2008-04-01  11342.000000
2008-07-01  11201.666667
2008-10-01  11321.000000
2009-01-01  11082.333333
2009-04-01  11099.000000
2009-07-01  10905.666667

time series graph

I would like to add, in the most simple way, a linear trend (with intercept) onto this graph. Also, I would like to compute this trend only conditional on data before, say, 2006.

I've found some answers here, but they all include statsmodels. First of all, these answers might be not up to date: pandas improved, and now itself includes an OLS component. Second, statsmodels appears to estimate an individual fixed-effect for each time period, instead of a linear trend. I suppose I could recalculate a running-quarter variable, but there most be a more comfortable way of doing this?

Click to copy

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 emplvl   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                    nan
Method:                 Least Squares   F-statistic:                     0.000
Date:                tor, 14 apr 2016   Prob (F-statistic):                nan
Time:                        17:17:43   Log-Likelihood:                 929.85
No. Observations:                  40   AIC:                            -1780.
Df Residuals:                       0   BIC:                            -1712.
Df Model:                          39                                         
Covariance Type:            nonrobust                                         
============================================================================================================
                                               coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------------------------------------
Intercept                                 1.095e+04        inf          0        nan           nan       nan
date[T.Timestamp('2003-04-01 00:00:00')]   135.3333        inf          0        nan           nan       nan
date[T.Timestamp('2003-07-01 00:00:00')]   202.0000        inf          0        nan           nan       nan
date[T.Timestamp('2003-10-01 00:00:00')]   380.6667        inf          0        nan           nan       nan
date[T.Timestamp('2004-01-01 00:00:00')]    90.0000        inf          0        nan           nan       nan
date[T.Timestamp('2004-04-01 00:00:00')]   220.6667        inf          0        nan           nan       nan

How do I, in the simplest way possible, estimate this trend and add the predicted values as a column to my data frame?

742

asked Apr 14 '16 15:04

FooBar

2 Answers

Here's a quick example on how to do this using pandas.ols:

Click to copy

import matplotlib.pyplot as plt
import pandas as pd

x = pd.Series(np.arange(50))
y = pd.Series(10 + (2 * x + np.random.randint(-5, + 5, 50)))
regression = pd.ols(y=y, x=x)
regression.summary

-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x> + <intercept>

Number of Observations:         50
Number of Degrees of Freedom:   2

R-squared:         0.9913
Adj R-squared:     0.9911

Rmse:              2.7625

F-stat (1, 48):  5465.1446, p-value:     0.0000

Degrees of Freedom: model 1, resid 48

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
             x     2.0013     0.0271      73.93     0.0000     1.9483     2.0544
     intercept     9.5271     0.7698      12.38     0.0000     8.0183    11.0358
---------------------------------End of Summary---------------------------------

trend = regression.predict(beta=regression.beta, x=x[20:]) # slicing to only use last 30 points
data = pd.DataFrame(index=x, data={'y': y, 'trend': trend})
data.plot() # add kwargs for title and other layout/design aspects
plt.show() # or plt.gcf().savefig(path)

enter image description here

answered Sep 20 '22 10:09

Stefan

In general you should create your matplotlib figure and axes object ahead of time, and explicitly plot the dataframe on that:

Click to copy

from matplotlib import pyplot
import pandas
import statsmodels.api as sm

df = pandas.read_csv(...)

fig, ax = pyplot.subplots()
df.plot(x='xcol', y='ycol', ax=ax)

Then you still have that axes object around to use directly to plot your line:

Click to copy

model = sm.formula.ols(formula='ycol ~ xcol', data=df)
res = model.fit()
df.assign(fit=res.fittedvalues).plot(x='xcol', y='fit', ax=ax)

answered Sep 19 '22 10:09

Paul H

Related questions
                            
                                Python project using protocol buffers, Deployment issues
                            
                                Show only errors with pylint and syntastic in vim
                            
                                BeautifulSoup find only elements where an attribute contains a sub-string? Is this possible?
                            
                                ImportError: No module named 'html.parser'; 'html' is not a package (python3) [duplicate]
                            
                                Creating transactions with with statements in psycopg2
                            
                                Matplotlib into a Django Template
                            
                                Read merged cells in Excel with Python
                            
                                Python: issue when using vars() dictionary
                            
                                Plotting Histogram with given x and y values
                            
                                argsort for a multidimensional ndarray
                            
                                How to detect bullet holes on the target
                            
                                Django model one foreign key to many tables
                            
                                Insert 0s into 2d array
                            
                                What's the difference between apt-get virtualenv and pip virtualenv?
                            
                                Argparse with two values for one argument
                            
                                How to have python code and markdown in one cell
                            
                                FileNotFoundError: [WinError 2] The system cannot find the file specified:
                            
                                Why does python/numpy's += mutate the original array?
                            
                                Set weight and bias tensors of tensorflow conv2d operation
                            
                                How to get the count of an element in a tensor in TensorFlow?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Add trend line to pandas

Tags:

python

pandas

matplotlib

machine-learning

statsmodels

FooBar

People also ask

2 Answers

Stefan

Paul H

Recent Activity

Donate For Us