I have a text file with four columns: year, month, day and snow depth. This is daily data for a 30-year period, 1979-2009.
I would like to calculate 360 (30yrs X 12 months) individual monthly averages using pandas (i.e. isolating all the values for Jan-1979, Feb-1979,... Dec-2009 and averaging each). Can anyone help me out with some example code?
1979 1 1 3
1979 1 2 3
1979 1 3 3
1979 1 4 3
1979 1 5 3
1979 1 6 3
1979 1 7 4
1979 1 8 5
1979 1 9 7
1979 1 10 8
1979 1 11 16
1979 1 12 16
1979 1 13 16
1979 1 14 18
1979 1 15 18
1979 1 16 18
1979 1 17 18
1979 1 18 20
1979 1 19 20
1979 1 20 20
1979 1 21 20
1979 1 22 20
1979 1 23 18
1979 1 24 18
1979 1 25 18
1979 1 26 18
1979 1 27 18
1979 1 28 18
1979 1 29 18
1979 1 30 18
1979 1 31 19
1979 2 1 19
1979 2 2 19
1979 2 3 19
1979 2 4 19
1979 2 5 19
1979 2 6 22
1979 2 7 24
1979 2 8 27
1979 2 9 29
1979 2 10 32
1979 2 11 32
1979 2 12 32
1979 2 13 32
1979 2 14 33
1979 2 15 33
1979 2 16 33
1979 2 17 34
1979 2 18 36
1979 2 19 36
1979 2 20 36
1979 2 21 36
1979 2 22 36
1979 2 23 36
1979 2 24 31
1979 2 25 29
1979 2 26 27
1979 2 27 27
1979 2 28 27
To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.
Once you have all the numbers for each month, add all the numbers together for each month, and then divide them by the total amount of months.
Method 1: using Python for-loops. Function new_case_count() takes in DataFrame object, iterates over it and converts indexes, which are dates in string format, to Pandas Datetime format. Based on the date's day of the week, each week's new cases count is calculated and stored in a list.
Pandas offer a diverse range of built-in functions that can be used to clean and manipulate datasets prior to analysis. It can allow you to drop incomplete rows and columns, fill missing values and improve the readability of the dataset through category renaming.
You'll want to group your data by year and month, and then calculate the mean of each group. Pseudo-code:
import numpy as np
import pandas as pd
# Read in your file as a pandas.DataFrame
# using 'any number of whitespace' as the seperator
df = pd.read_csv("snow.txt", sep='\s*', names=["year", "month", "day", "snow_depth"])
# Show the first 5 rows of the DataFrame
print df.head()
# Group data first by year, then by month
g = df.groupby(["year", "month"])
# For each group, calculate the average of only the snow_depth column
monthly_averages = g.aggregate({"snow_depth":np.mean})
For more, about the split-apply-combine approach in Pandas, read here.
A DataFrame is a:
"Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns)."
For your purposes, the difference between a numpy ndarray
and a DataFrame
are not too significant, but DataFrames have a bunch of functions that will make your life easier, so I'd suggest doing some reading on them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With