Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby hour of the day to dictionary

I have a pandas data frame as follows:

date                | Item   | count
------------------------------------
2016-12-06 10:45:08 |  Item1 |  60
2016-12-06 10:45:08 |  Item2 |  145
2016-12-06 09:45:00 |  Item1 |  60
2016-12-06 09:44:54 |  Item3 |  600
2016-12-06 09:44:48 |  Item4 |  15
2016-12-06 11:45:08 |  Item1 |  60
2016-12-06 10:45:08 |  Item2 |  14
2016-11-06 09:45:00 |  Item1 |  62
2016-11-06 09:44:54 |  Item3 |  6
2016-11-06 09:44:48 |  Item4 |  15

I am trying to groupby the Items by let's say hour of the day (or later just day) to know the following statistics: list of items sold per day, such as:

  • On 2016-12-06, from 09:00:00 to 10:00:00, Item1 , Item3 and Item4 were sold; and so on.
  • On 2016-12-06, Item1, Item2, Item3, Item4 (unique items) were sold.

While I am far away from fetching these statistics, I am stuck with grouping by time. Initially, the print df.dtypes showed

date    object
Item    object
count   int64
dtype: object

So, I used the following line of code to convert the date column to a pandas date object.

df['date'] = pd.to_datetime(df['date'])

and now, print df.dtypes yields:

date    datetime64[ns]
Item    object
count   int64
dtype: object

However, when I try to groupby the date column using TimeGrouper by executing the following lines of code

from pandas.tseries.resample import TimeGrouper 
print df.groupby([df['date'],pd.TimeGrouper(freq='Min')])

I get the following TypeError. As per the suggestions given here or here, converting using pd.to_datetime should have resolved this issue.

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'

I have no clue how to solve this issue to proceed to the statistics I am looking for. Any tips on resolving this error and using TimeGrouper to seek the statistics preferably in a dictionary format (or anything that makes more sense) would be much appreciated.

like image 929
kingmakerking Avatar asked Feb 02 '17 09:02

kingmakerking


1 Answers

sold = df.set_index('date').Item.resample('H').agg({'Sold': 'unique'})
sold[sold.Sold.str.len() > 0]

                                      Sold
date                                      
2016-11-06 09:00:00  [Item4, Item3, Item1]
2016-12-06 09:00:00  [Item4, Item3, Item1]
2016-12-06 10:00:00         [Item1, Item2]
2016-12-06 11:00:00                [Item1]
like image 93
piRSquared Avatar answered Sep 20 '22 22:09

piRSquared