I have data by date and want to create a new dataframe by week with sum of sales and count of categories.
#standard packages
import numpy as np
import pandas as pd
#visualization
%matplotlib inline
import matplotlib.pylab as plt
#create weekly datetime index
edf = pd.read_csv('C:\Users\j~\raw.csv', parse_dates=[6])
edf2 = edf[['DATESENT','Sales','Category']].copy()
edf2
#output
DATESENT | SALES | CATEGORY
2014-01-04 100 A
2014-01-05 150 B
2014-01-07 150 C
2014-01-10 175 D
#create datetime index of week
edf2['DATESENT']=pd.to_datetime(edf2['DATESENT'],format='%m/%d/%Y')
edf2 = edf2.set_index(pd.DatetimeIndex(edf2['DATESENT']))
edf2.resample('w').sum()
edf2
#output
SALES CATEGORY
DATESENT
2014-01-05 250 AB
2014-01-12 325 CD
But I am looking for
SALES CATEGORY
DATESENT
2014-01-05 250 2
2014-01-12 325 2
This didn't work ...
edf2 = e2.resample('W').agg("Category":len,"Sales":np.sum)
Thank you
Pandas Series: resample() functionThe resample() function is used to resample time-series data. Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.
Since True is considered 1 and False is considered 0 in Python, you can get the number of elements that satisfy the condition with the sum() method. By default, it counts per column, and with axis=1 , it counts per row. sum() of pandas. DataFrame returns pandas.
Resample Hourly Data to Daily Dataresample() method. To aggregate or temporal resample the data for a time period, you can take all of the values for each day and summarize them. In this case, you want total daily rainfall, so you will use the resample() method together with . sum() .
Pandas DataFrame count() Method The count() method counts the number of not empty values for each row, or column if you specify the axis parameter as axis='columns' , and returns a Series object with the result for each row (or column).
Agg takes a dictionary as arguments in various formats.
edf2 = e2.resample('W').agg({"Category":'size',"Sales":'sum'})
using pd.TimeGrouper
+ agg
f = {'SALES': 'sum', 'CATEGORY': 'count'}
g = pd.TimeGrouper('W')
df.set_index('DATESENT').groupby(g).agg(f)
CATEGORY SALES
DATESENT
2014-01-05 2 250
2014-01-12 2 325
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With