Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Monthly Averages Using Daily Data Using Python Pandas

I have a text file with four columns: year, month, day and snow depth. This is daily data for a 30-year period, 1979-2009.

I would like to calculate 360 (30yrs X 12 months) individual monthly averages using pandas (i.e. isolating all the values for Jan-1979, Feb-1979,... Dec-2009 and averaging each). Can anyone help me out with some example code?

1979    1   1   3
1979    1   2   3
1979    1   3   3
1979    1   4   3
1979    1   5   3
1979    1   6   3
1979    1   7   4
1979    1   8   5
1979    1   9   7
1979    1   10  8
1979    1   11  16
1979    1   12  16
1979    1   13  16
1979    1   14  18
1979    1   15  18
1979    1   16  18
1979    1   17  18
1979    1   18  20
1979    1   19  20
1979    1   20  20
1979    1   21  20
1979    1   22  20
1979    1   23  18
1979    1   24  18
1979    1   25  18
1979    1   26  18
1979    1   27  18
1979    1   28  18
1979    1   29  18
1979    1   30  18
1979    1   31  19
1979    2   1   19
1979    2   2   19
1979    2   3   19
1979    2   4   19
1979    2   5   19
1979    2   6   22
1979    2   7   24
1979    2   8   27
1979    2   9   29
1979    2   10  32
1979    2   11  32
1979    2   12  32
1979    2   13  32
1979    2   14  33
1979    2   15  33
1979    2   16  33
1979    2   17  34
1979    2   18  36
1979    2   19  36
1979    2   20  36
1979    2   21  36
1979    2   22  36
1979    2   23  36
1979    2   24  31
1979    2   25  29
1979    2   26  27
1979    2   27  27
1979    2   28  27
like image 805
DJV Avatar asked Apr 21 '15 03:04

DJV


People also ask

How does Python calculate average in pandas?

To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.

How do you make a monthly average?

Once you have all the numbers for each month, add all the numbers together for each month, and then divide them by the total amount of months.

How do I convert daily data to weekly in pandas?

Method 1: using Python for-loops. Function new_case_count() takes in DataFrame object, iterates over it and converts indexes, which are dates in string format, to Pandas Datetime format. Based on the date's day of the week, each week's new cases count is calculated and stored in a list.

Is pandas good for data cleaning?

Pandas offer a diverse range of built-in functions that can be used to clean and manipulate datasets prior to analysis. It can allow you to drop incomplete rows and columns, fill missing values and improve the readability of the dataset through category renaming.


1 Answers

You'll want to group your data by year and month, and then calculate the mean of each group. Pseudo-code:

import numpy as np
import pandas as pd

# Read in your file as a pandas.DataFrame
# using 'any number of whitespace' as the seperator
df = pd.read_csv("snow.txt", sep='\s*', names=["year", "month", "day", "snow_depth"])

# Show the first 5 rows of the DataFrame
print df.head()

# Group data first by year, then by month
g = df.groupby(["year", "month"])

# For each group, calculate the average of only the snow_depth column
monthly_averages = g.aggregate({"snow_depth":np.mean})

For more, about the split-apply-combine approach in Pandas, read here.

A DataFrame is a:

"Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns)."

For your purposes, the difference between a numpy ndarray and a DataFrame are not too significant, but DataFrames have a bunch of functions that will make your life easier, so I'd suggest doing some reading on them.

like image 95
Zachary Cross Avatar answered Oct 07 '22 12:10

Zachary Cross