I have data
data id url size domain subdomain
13/Jun/2016:06:27:26 30055 https://api.weather.com/v1/geocode/55.740002/37.610001/aggregate.json?apiKey=e45ff1b7c7bda231216c7ab7c33509b8&products=conditionsshort,fcstdaily10short,fcsthourly24short,nowlinks 3929 weather.com api.weather.com
13/Jun/2016:06:27:26 30055 https://api.weather.com/v1/geocode/54.720001/20.469999/aggregate.json?apiKey=e45ff1b7c7bda231216c7ab7c33509b8&products=conditionsshort,fcstdaily10short,fcsthourly24short,nowlinks 3845 weather.com api.weather.com
13/Jun/2016:06:27:27 3845 https://api.weather.com/v1/geocode/54.970001/73.370003/aggregate.json?apiKey=e45ff1b7c7bda231216c7ab7c33509b8&products=conditionsshort,fcstdaily10short,fcsthourly24short,nowlinks 30055 weather.com api.weather.com
13/Jun/2016:06:27:27 30055 https://api.weather.com/v1/geocode/59.919998/30.219999/aggregate.json?apiKey=e45ff1b7c7bda231216c7ab7c33509b8&products=conditionsshort,fcstdaily10short,fcsthourly24short,nowlinks 3914 weather.com api.weather.com
13/Jun/2016:06:27:28 30055 https://facebook.com 4005 facebook.com facebook.com
I need to group it with interval 5 minutes. Desire output
data id url size domain subdomain
13/Jun/2016:06:27:26 30055 https://api.weather.com/v1/geocode/55.740002/37.610001/aggregate.json?apiKey=e45ff1b7c7bda231216c7ab7c33509b8&products=conditionsshort,fcstdaily10short,fcsthourly24short,nowlinks 3929 weather.com api.weather.com
13/Jun/2016:06:27:27 3845 https://api.weather.com/v1/geocode/54.970001/73.370003/aggregate.json?apiKey=e45ff1b7c7bda231216c7ab7c33509b8&products=conditionsshort,fcstdaily10short,fcsthourly24short,nowlinks 30055 weather.com api.weather.com
13/Jun/2016:06:27:28 30055 https://facebook.com 4005 facebook.com facebook.com
I need to groupby id, subdomain
and establish interval 5min
I try use
print df.groupby([df['data'],pd.TimeGrouper(freq='Min')])
to group first with minute, but it return TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
Output: In the above example, the dataframe is groupby by the Date column. As we have provided freq = 'M' which means month, so the data is grouped month-wise till the last date of every month and provided sum of price column.
Pandas groupby is used for grouping the data according to the categories and apply a function to the categories. It also helps to aggregate data efficiently. Pandas dataframe. groupby() function is used to split the data into groups based on some criteria.
How to perform groupby index in pandas? Pass index name of the DataFrame as a parameter to groupby() function to group rows on an index. DataFrame. groupby() function takes string or list as a param to specify the group columns or index.
You need to parse data
using pd.to_datetime()
with appropriate format
settings and use the result as index
. Then .groupby()
while resampling to 5Min
intervals:
df.index = pd.to_datetime(df.data, format='%d/%b/%Y:%H:%M:%S')
df.groupby(pd.TimeGrouper('5Min')).apply(lambda x: x.groupby(['id', 'subdomain']).first())
data \
data id subdomain
2016-06-13 06:25:00 3845 api.weather.com 13/Jun/2016:06:27:27
30055 api.weather.com 13/Jun/2016:06:27:26
facebook.com 13/Jun/2016:06:27:28
url \
data id subdomain
2016-06-13 06:25:00 3845 api.weather.com https://api.weather.com/v1/geocode/54.970001/7...
30055 api.weather.com https://api.weather.com/v1/geocode/55.740002/3...
facebook.com https://facebook.com
size domain
data id subdomain
2016-06-13 06:25:00 3845 api.weather.com 30055 weather.com
30055 api.weather.com 3929 weather.com
facebook.com 4005 facebook.com
Note to convert to datetime you can pass the following format:
df['data'] = pd.to_datetime(df['data'], format="%d/%b/%Y:%H:%M:%S")
Now you can use the groupby:
In [11]: df1 = df.set_index("data")
In [12]: df1.groupby(pd.TimeGrouper("5Min")).sum()
Out[12]:
id size
data
2016-06-13 06:25:00 124065 45748
This is better written as a resample:
In [13]: df1.resample("5Min").sum()
Out[13]:
id size
data
2016-06-13 06:25:00 124065 45748
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With