Monthly Averages Using Daily Data Using Python Pandas

Tags:

I have a text file with four columns: year, month, day and snow depth. This is daily data for a 30-year period, 1979-2009.

I would like to calculate 360 (30yrs X 12 months) individual monthly averages using pandas (i.e. isolating all the values for Jan-1979, Feb-1979,... Dec-2009 and averaging each). Can anyone help me out with some example code?

Click to copy

1979    1   1   3
1979    1   2   3
1979    1   3   3
1979    1   4   3
1979    1   5   3
1979    1   6   3
1979    1   7   4
1979    1   8   5
1979    1   9   7
1979    1   10  8
1979    1   11  16
1979    1   12  16
1979    1   13  16
1979    1   14  18
1979    1   15  18
1979    1   16  18
1979    1   17  18
1979    1   18  20
1979    1   19  20
1979    1   20  20
1979    1   21  20
1979    1   22  20
1979    1   23  18
1979    1   24  18
1979    1   25  18
1979    1   26  18
1979    1   27  18
1979    1   28  18
1979    1   29  18
1979    1   30  18
1979    1   31  19
1979    2   1   19
1979    2   2   19
1979    2   3   19
1979    2   4   19
1979    2   5   19
1979    2   6   22
1979    2   7   24
1979    2   8   27
1979    2   9   29
1979    2   10  32
1979    2   11  32
1979    2   12  32
1979    2   13  32
1979    2   14  33
1979    2   15  33
1979    2   16  33
1979    2   17  34
1979    2   18  36
1979    2   19  36
1979    2   20  36
1979    2   21  36
1979    2   22  36
1979    2   23  36
1979    2   24  31
1979    2   25  29
1979    2   26  27
1979    2   27  27
1979    2   28  27

805

asked Apr 21 '15 03:04

DJV

1 Answers

You'll want to group your data by year and month, and then calculate the mean of each group. Pseudo-code:

Click to copy

import numpy as np
import pandas as pd

# Read in your file as a pandas.DataFrame
# using 'any number of whitespace' as the seperator
df = pd.read_csv("snow.txt", sep='\s*', names=["year", "month", "day", "snow_depth"])

# Show the first 5 rows of the DataFrame
print df.head()

# Group data first by year, then by month
g = df.groupby(["year", "month"])

# For each group, calculate the average of only the snow_depth column
monthly_averages = g.aggregate({"snow_depth":np.mean})

For more, about the split-apply-combine approach in Pandas, read here.

A DataFrame is a:

"Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns)."

For your purposes, the difference between a numpy ndarray and a DataFrame are not too significant, but DataFrames have a bunch of functions that will make your life easier, so I'd suggest doing some reading on them.

answered Oct 07 '22 12:10

Zachary Cross

Related questions
                            
                                App Engine Unit Testing: ImportError: Start directory is not importable
                            
                                Marshal unserialization - not secure
                            
                                PyQt5 and Matplotlib 1.4.2 - installing one breaks the other
                            
                                The nature of pandas DataFrame
                            
                                Add form fields to django form dynamically
                            
                                NLTK convert tokenized sentence to synset format
                            
                                Python input single character without enter
                            
                                Append pandas DataFrame column to CSV
                            
                                What port to use on heroku python app
                            
                                Convert a specific column into row names in Pandas
                            
                                Can I use += on multiple variables on one line?
                            
                                RAII in Python: What's the point of __del__?
                            
                                Why would django fail with server 500 only when Debug=False AND db is set to production database on Heroku?
                            
                                Launching nested processes in multiprocessing
                            
                                Returns True if the two strings only differ by one character [closed]
                            
                                Python Random Function without using random module
                            
                                Creating random pairs from lists
                            
                                Save/load sympy lambdifed expressions
                            
                                How to access popup login form with Selenium in Python
                            
                                error while following Tumblelog Application with Flask and MongoEngine

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Monthly Averages Using Daily Data Using Python Pandas

Tags:

python

pandas

time-series

DJV

People also ask

1 Answers

Zachary Cross

Recent Activity

Donate For Us