Python - Aggregate by month and calculate average

Tags:

I have a csv which looks like this:

Date,Sentiment
2014-01-03,0.4
2014-01-04,-0.03
2014-01-09,0.0
2014-01-10,0.07
2014-01-12,0.0
2014-02-24,0.0
2014-02-25,0.0
2014-02-25,0.0
2014-02-26,0.0
2014-02-28,0.0
2014-03-01,0.1
2014-03-02,-0.5
2014-03-03,0.0
2014-03-08,-0.06
2014-03-11,-0.13
2014-03-22,0.0
2014-03-23,0.33
2014-03-23,0.3
2014-03-25,-0.14
2014-03-28,-0.25
etc

And my goal is to aggregate date by months and calculate average of months. Dates might not start with 1. or January. Problem is that I have a lot of data, that means I have more years. For this purpose I would like to find the soonest date (month) and from there start counting months and their averages. For example:

Month count, average
1, 0.4 (<= the earliest month)
2, -0.3
3, 0.0
...
12, 0.1
13, -0.4 (<= new year but counting of month is continuing)
14, 0.3

I'm using Pandas to open csv

data = pd.read_csv("pks.csv", sep=",")

so in data['Date'] I have dates and in data['Sentiment'] I have values. Any idea how to do it?

627

asked May 25 '14 20:05

Jaroslav Klimčík

2 Answers

Probably the simplest approach is to use the resample command. First, when you read in your data make sure you parse the dates and set the date column as your index (ignore the StringIO part and the header=True ... I am reading in your sample data from a multi-line string):

>>> df = pd.read_csv(StringIO(data),header=True,parse_dates=['Date'],
                     index_col='Date')
>>> df

            Sentiment
Date
2014-01-03       0.40
2014-01-04      -0.03
2014-01-09       0.00
2014-01-10       0.07
2014-01-12       0.00
2014-02-24       0.00 
2014-02-25       0.00
2014-02-25       0.00
2014-02-26       0.00
2014-02-28       0.00
2014-03-01       0.10
2014-03-02      -0.50
2014-03-03       0.00
2014-03-08      -0.06
2014-03-11      -0.13
2014-03-22       0.00
2014-03-23       0.33
2014-03-23       0.30
2014-03-25      -0.14
2014-03-28      -0.25


>>> df.resample('M').mean()

            Sentiment
2014-01-31      0.088
2014-02-28      0.000
2014-03-31     -0.035

And if you want a month counter, you can add it after your resample:

>>> agg = df.resample('M',how='mean')
>>> agg['cnt'] = range(len(agg))
>>> agg

            Sentiment  cnt
2014-01-31      0.088    0
2014-02-28      0.000    1
2014-03-31     -0.035    2

You can also do this with the groupby method and the TimeGrouper function (group by month and then call the mean convenience method that is available with groupby).

>>> df.groupby(pd.TimeGrouper(freq='M')).mean()

            Sentiment
2014-01-31      0.088
2014-02-28      0.000
2014-03-31     -0.035

157

answered Sep 27 '22 22:09

Karl D.

To get the monthly average values of a Data Frame when the DataFrame has daily data rows 'Sentiment', I would:

Convert the column with the dates , df['dates'] into the index of the DataFrame df: df.set_index('date',inplace=True)
Then I'll convert the index dates into a month-index: df.index.month
Finally I'll calculate the mean of the DataFrame GROUPED BY MONTH: df.groupby(df.index.month).Sentiment.mean()

I go slowly throw each step here:

Generation DataFrame with dates and values

You need first to import Pandas and Numpy, as well as the module datetime
```
from datetime import datetime
```

Generate a Column 'date' between 1/1/2019 and the 3/05/2019, at week 'W' intervals. And a column 'Sentiment'with random values between 1-100:

date_rng = pd.date_range(start='1/1/2018', end='3/05/2018', freq='W')
df = pd.DataFrame(date_rng, columns=['date'])
df['Sentiment']=np.random.randint(0,100,size=(len(date_rng)))

the df has two columns 'date' and 'Sentiment':

        date  Sentiment
0 2018-01-07         34
1 2018-01-14         32
2 2018-01-21         15
3 2018-01-28          0
4 2018-02-04         95
5 2018-02-11         53
6 2018-02-18          7
7 2018-02-25         35
8 2018-03-04         17

Set `'date'`column as the index of the DataFrame:

df.set_index('date',inplace=True)

df has one column 'Sentiment' and the index is 'date':

            Sentiment
date                 
2018-01-07         34
2018-01-14         32
2018-01-21         15
2018-01-28          0
2018-02-04         95
2018-02-11         53
2018-02-18          7
2018-02-25         35
2018-03-04         17

Capture the month number from the index

    months=df.index.month

Obtain the mean value of each month grouping by month:

    monthly_avg=df.groupby(months).Sentiment.mean()

The mean of the dataset by month `'monthly_avg'` is:

answered Sep 27 '22 20:09

pink.slash

Related questions
                            
                                Python: How to download file using range of bytes?
                            
                                SImple (but specific) listener and sender Python 3 DBus example
                            
                                Fast inverse and transpose matrix in Python
                            
                                Calculate cosine similarity of two matrices
                            
                                Python - What's the difference between "in" and "in x for x in"
                            
                                Replacing one item in a list with two items
                            
                                Without pointers, can I pass references as arguments in Python? [duplicate]
                            
                                How to know the version of installed pylab?
                            
                                appending a data in a specific line of a text file in Python?
                            
                                pd.to_datetime change date format producing wrong dates
                            
                                Confusing error when trying to run Python script
                            
                                Thread condition variables: un-acquired lock
                            
                                no module named ecdsa with Paramiko
                            
                                PyCharm doesn't recognize my Python installation path
                            
                                Simulating ajax POST call using Python Requests
                            
                                Ipython notebook align Latex equations in Ipython.Display module
                            
                                Django urls.py, what does the name parameter do?
                            
                                Insert Python datetime to Oracle column of type DATE
                            
                                in Python 2.x, why is the > operator supported between function and int? [duplicate]
                            
                                Extract decision boundary with scikit-learn linear SVM

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python - Aggregate by month and calculate average

Tags:

python

date

pandas

csv

aggregate

Jaroslav Klimčík

People also ask

2 Answers

Karl D.

Generation DataFrame with dates and values

Set `'date'`column as the index of the DataFrame:

Capture the month number from the index

Obtain the mean value of each month grouping by month:

The mean of the dataset by month `'monthly_avg'` is:

pink.slash

Recent Activity

Donate For Us

Python - Aggregate by month and calculate average

Tags:

python

date

pandas

csv

aggregate

Jaroslav Klimčík

People also ask

2 Answers

Karl D.

Generation DataFrame with dates and values

Set 'date'column as the index of the DataFrame:

Capture the month number from the index

Obtain the mean value of each month grouping by month:

The mean of the dataset by month 'monthly_avg' is:

pink.slash

Related questions

Recent Activity

Donate For Us

Set `'date'`column as the index of the DataFrame:

The mean of the dataset by month `'monthly_avg'` is: