Rolling Mean on pandas on a specific column

Tags:

I have a data frame like this which is imported from a CSV.

              stock  pop
Date
2016-01-04  325.316   82
2016-01-11  320.036   83
2016-01-18  299.169   79
2016-01-25  296.579   84
2016-02-01  295.334   82
2016-02-08  309.777   81
2016-02-15  317.397   75
2016-02-22  328.005   80
2016-02-29  315.504   81
2016-03-07  328.802   81
2016-03-14  339.559   86
2016-03-21  352.160   82
2016-03-28  348.773   84
2016-04-04  346.482   83
2016-04-11  346.980   80
2016-04-18  357.140   75
2016-04-25  357.439   77
2016-05-02  356.443   78
2016-05-09  365.158   78
2016-05-16  352.160   72
2016-05-23  344.540   74
2016-05-30  354.998   81
2016-06-06  347.428   77
2016-06-13  341.053   78
2016-06-20  363.515   80
2016-06-27  349.669   80
2016-07-04  371.583   82
2016-07-11  358.335   81
2016-07-18  362.021   79
2016-07-25  368.844   77
...             ...  ...

I wanted to add a new column MA which calculates Rolling mean for the column pop. I tried the following

df['MA']=data.rolling(5,on='pop').mean()

I get an error

ValueError: Wrong number of items passed 2, placement implies 1

So I thought let me try if it just works without adding a column. I used

 data.rolling(5,on='pop').mean()

I got the output

               stock  pop
Date
2016-01-04       NaN   82
2016-01-11       NaN   83
2016-01-18       NaN   79
2016-01-25       NaN   84
2016-02-01  307.2868   82
2016-02-08  304.1790   81
2016-02-15  303.6512   75
2016-02-22  309.4184   80
2016-02-29  313.2034   81
2016-03-07  319.8970   81
2016-03-14  325.8534   86
2016-03-21  332.8060   82
2016-03-28  336.9596   84
2016-04-04  343.1552   83
2016-04-11  346.7908   80
2016-04-18  350.3070   75
2016-04-25  351.3628   77
2016-05-02  352.8968   78
2016-05-09  356.6320   78
2016-05-16  357.6680   72
2016-05-23  355.1480   74
2016-05-30  354.6598   81
2016-06-06  352.8568   77
2016-06-13  348.0358   78
2016-06-20  350.3068   80
2016-06-27  351.3326   80
2016-07-04  354.6496   82
2016-07-11  356.8310   81
2016-07-18  361.0246   79
2016-07-25  362.0904   77
...              ...  ...

I can't seem to apply Rolling mean on the column pop. What am I doing wrong?

868

asked Apr 16 '17 13:04

Anti21

2 Answers

To assign a column, you can create a rolling object based on your Series:

df['new_col'] = data['column'].rolling(5).mean()

The answer posted by ac2001 is not the most performant way of doing this. He is calculating a rolling mean on every column in the dataframe, then he is assigning the "ma" column using the "pop" column. The first method of the following is much more efficient:

%timeit df['ma'] = data['pop'].rolling(5).mean()
%timeit df['ma_2'] = data.rolling(5).mean()['pop']

1000 loops, best of 3: 497 µs per loop
100 loops, best of 3: 2.6 ms per loop

I would not recommend using the second method unless you need to store computed rolling means on all other columns.

193

answered Oct 11 '22 04:10

Andrew L

Edit: pd.rolling_mean is deprecated in pandas and will be removed in future. Instead: Using pd.rolling you can do:

df['MA'] = df['pop'].rolling(window=5,center=False).mean()

for a dataframe df:

          Date    stock  pop
0   2016-01-04  325.316   82
1   2016-01-11  320.036   83
2   2016-01-18  299.169   79
3   2016-01-25  296.579   84
4   2016-02-01  295.334   82
5   2016-02-08  309.777   81
6   2016-02-15  317.397   75
7   2016-02-22  328.005   80
8   2016-02-29  315.504   81
9   2016-03-07  328.802   81

To get:

          Date    stock  pop    MA
0   2016-01-04  325.316   82   NaN
1   2016-01-11  320.036   83   NaN
2   2016-01-18  299.169   79   NaN
3   2016-01-25  296.579   84   NaN
4   2016-02-01  295.334   82  82.0
5   2016-02-08  309.777   81  81.8
6   2016-02-15  317.397   75  80.2
7   2016-02-22  328.005   80  80.4
8   2016-02-29  315.504   81  79.8
9   2016-03-07  328.802   81  79.6

Documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html

Old: Although it is deprecated you can use:

df['MA']=pd.rolling_mean(df['pop'], window=5)

to get:

          Date    stock  pop    MA
0   2016-01-04  325.316   82   NaN
1   2016-01-11  320.036   83   NaN
2   2016-01-18  299.169   79   NaN
3   2016-01-25  296.579   84   NaN
4   2016-02-01  295.334   82  82.0
5   2016-02-08  309.777   81  81.8
6   2016-02-15  317.397   75  80.2
7   2016-02-22  328.005   80  80.4
8   2016-02-29  315.504   81  79.8
9   2016-03-07  328.802   81  79.6

Documentation: http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.rolling_mean.html

answered Oct 11 '22 04:10

Chuck

Related questions
                            
                                Python unicode equal comparison failed
                            
                                How to install my own python module (package) via conda and watch its changes
                            
                                Is it okay to pass self to an external function
                            
                                Is Tensorflow compatible with a Windows workflow?
                            
                                What is the difference between armeabi-v7a, arm64-v8a, x86?
                            
                                Converting Exception to a string in Python 3
                            
                                Python argparse and bash completion
                            
                                Trouble passing in lambda to apply for pandas DataFrame
                            
                                preventing python coverage from including virtual environment site packages
                            
                                Running cron python jobs within docker
                            
                                Django 1.11 TypeError context must be a dict rather than Context
                            
                                output the command line called by subprocess?
                            
                                threading ignores KeyboardInterrupt exception
                            
                                directory path types with argparse
                            
                                pandas concat generates nan values
                            
                                Non blocking subprocess.call
                            
                                Flask jsonify a list of objects
                            
                                How to limit the size of a dictionary?
                            
                                Pandas sort by group aggregate and column
                            
                                Python - a bytes like object is required, not str

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Rolling Mean on pandas on a specific column

Tags:

python

python-3.x

pandas

dataframe

Anti21

People also ask

2 Answers

Andrew L

Chuck

Recent Activity

Donate For Us