Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas resample count and sum

I have data by date and want to create a new dataframe by week with sum of sales and count of categories.

#standard packages
import numpy as np
import pandas as pd

#visualization
%matplotlib inline
import matplotlib.pylab as plt

#create weekly datetime index
edf = pd.read_csv('C:\Users\j~\raw.csv', parse_dates=[6])
edf2 = edf[['DATESENT','Sales','Category']].copy()
edf2

#output

DATESENT    |  SALES  | CATEGORY
2014-01-04      100        A
2014-01-05      150        B
2014-01-07      150        C
2014-01-10      175        D

#create datetime index of week
edf2['DATESENT']=pd.to_datetime(edf2['DATESENT'],format='%m/%d/%Y')
edf2 = edf2.set_index(pd.DatetimeIndex(edf2['DATESENT']))
edf2.resample('w').sum()
edf2

#output

            SALES CATEGORY 
DATESENT     
2014-01-05  250      AB
2014-01-12  325      CD

But I am looking for

           SALES CATEGORY 
DATESENT     
2014-01-05  250      2
2014-01-12  325      2

This didn't work ...

edf2 = e2.resample('W').agg("Category":len,"Sales":np.sum)

Thank you

like image 620
jeangelj Avatar asked Mar 21 '17 21:03

jeangelj


People also ask

How do I resample data in pandas?

Pandas Series: resample() functionThe resample() function is used to resample time-series data. Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.

How do pandas count trues?

Since True is considered 1 and False is considered 0 in Python, you can get the number of elements that satisfy the condition with the sum() method. By default, it counts per column, and with axis=1 , it counts per row. sum() of pandas. DataFrame returns pandas.

How do you resample data in Python?

Resample Hourly Data to Daily Dataresample() method. To aggregate or temporal resample the data for a time period, you can take all of the values for each day and summarize them. In this case, you want total daily rainfall, so you will use the resample() method together with . sum() .

What is .count in pandas?

Pandas DataFrame count() Method The count() method counts the number of not empty values for each row, or column if you specify the axis parameter as axis='columns' , and returns a Series object with the result for each row (or column).


2 Answers

Agg takes a dictionary as arguments in various formats.

edf2 = e2.resample('W').agg({"Category":'size',"Sales":'sum'})
like image 178
Scott Boston Avatar answered Sep 19 '22 01:09

Scott Boston


using pd.TimeGrouper + agg

f = {'SALES': 'sum', 'CATEGORY': 'count'}
g = pd.TimeGrouper('W')
df.set_index('DATESENT').groupby(g).agg(f)

            CATEGORY  SALES
DATESENT                   
2014-01-05         2    250
2014-01-12         2    325
like image 45
piRSquared Avatar answered Sep 20 '22 01:09

piRSquared