What's the equivalent of cut/qcut for pandas date fields?

Tags:

pandas

Update: starting with version 0.20.0, pandas cut/qcut DOES handle date fields. See What's New for more.

pd.cut and pd.qcut now support datetime64 and timedelta64 dtypes (GH14714, GH14798)

Original question: Pandas cut and qcut functions are great for 'bucketing' continuous data for use in pivot tables and so forth, but I can't see an easy way to get datetime axes in the mix. Frustrating since pandas is so great at all the time-related stuff!

Here's a simple example:

def randomDates(size, start=134e7, end=137e7):
    return np.array(np.random.randint(start, end, size), dtype='datetime64[s]')

df = pd.DataFrame({'ship' : randomDates(10), 'recd' : randomDates(10), 
                   'qty' : np.random.randint(0,10,10), 'price' : 100*np.random.random(10)})
df

     price      qty recd                ship
0    14.723510   3  2012-11-30 19:32:27 2013-03-08 23:10:12
1    53.535143   2  2012-07-25 14:26:45 2012-10-01 11:06:39
2    85.278743   7  2012-12-07 22:24:20 2013-02-26 10:23:20
3    35.940935   8  2013-04-18 13:49:43 2013-03-29 21:19:26
4    54.218896   8  2013-01-03 09:00:15 2012-08-08 12:50:41
5    61.404931   9  2013-02-10 19:36:54 2013-02-23 13:14:42
6    28.917693   1  2012-12-13 02:56:40 2012-09-08 21:14:45
7    88.440408   8  2013-04-04 22:54:55 2012-07-31 18:11:35
8    77.329931   7  2012-11-23 00:49:26 2012-12-09 19:27:40
9    46.540859   5  2013-03-13 11:37:59 2013-03-17 20:09:09

To bin by groups of price or quantity, I can use cut/qcut to bucket them:

df.groupby([pd.cut(df['qty'], bins=[0,1,5,10]), pd.qcut(df['price'],q=3)]).count()

                       price  qty recd ship
qty     price               
(0, 1]  [14.724, 46.541]   1   1   1   1
(1, 5]  [14.724, 46.541]   2   2   2   2
        (46.541, 61.405]   1   1   1   1
(5, 10] [14.724, 46.541]   1   1   1   1
        (46.541, 61.405]   2   2   2   2
         (61.405, 88.44]   3   3   3   3

But I can't see any easy way of doing the same thing with my 'recd' or 'ship' date fields. For example, generate a similar table of counts broken down by (say) monthly buckets of recd and ship. It seems like resample() has all of the machinery to bucket into periods, but I can't figure out how to apply it here. The buckets (or levels) in the 'date cut' would be equivalent to a pandas.PeriodIndex, and then I want to label each value of df['recd'] with the period it falls into?

So the kind of output I'm looking for would be something like:

ship    recv     count
2011-01 2011-01  1
        2011-02  3
        ...      ...
2011-02 2011-01  2
        2011-02  6
...     ...      ...

More generally, I'd like to be able to mix and match continuous or categorical variables in the output. Imagine df also contains a 'status' column with red/yellow/green values, then maybe I want to summarize counts by status, price bucket, ship and recd buckets, so:

ship    recv     price   status count
2011-01 2011-01  [0-10)   green     1
                            red     4
                 [10-20) yellow     2
                  ...      ...    ...
        2011-02  [0-10)  yellow     3
        ...      ...       ...    ...

As a bonus question, what's the simplest way to modify the groupby() result above to just contain a single output column called 'count'?

337

asked May 01 '13 13:05

patricksurry

1 Answers

Here's a solution using pandas.PeriodIndex (caveat: PeriodIndex doesn't seem to support time rules with a multiple > 1, such as '4M'). I think the answer to your bonus question is .size().

In [49]: df.groupby([pd.PeriodIndex(df.recd, freq='Q'),
   ....:             pd.PeriodIndex(df.ship, freq='Q'),
   ....:             pd.cut(df['qty'], bins=[0,5,10]),
   ....:             pd.qcut(df['price'],q=2),
   ....:            ]).size()
Out[49]: 
                qty      price 
2012Q2  2013Q1  (0, 5]   [2, 5]    1
2012Q3  2013Q1  (5, 10]  [2, 5]    1
2012Q4  2012Q3  (5, 10]  [2, 5]    1
        2013Q1  (0, 5]   [2, 5]    1
                (5, 10]  [2, 5]    1
2013Q1  2012Q3  (0, 5]   (5, 8]    1
        2013Q1  (5, 10]  (5, 8]    2
2013Q2  2012Q4  (0, 5]   (5, 8]    1
        2013Q2  (0, 5]   [2, 5]    1

171

answered Oct 27 '22 04:10

Garrett

Related questions
                            
                                How to define a class attribute with no default value in python
                            
                                Binary data over websocket without encoding to UTF-8 or base64
                            
                                What is PasteDeploy and do I need to learn it if Eggs in Python are considered gone?
                            
                                Emacs set spacing for inline (end of line) comments
                            
                                python: change sys.stdout print to custom print function
                            
                                I want to stream a webcam feed using socket programming in Python
                            
                                Hebrew calendar in python
                            
                                mathematical limits in python?
                            
                                Python matlplotlib add hyperlink to text
                            
                                pydoc supported python metadata such as __version__ = '0.1'
                            
                                How to read filenames included into a gz file
                            
                                append 2 hex values in python
                            
                                How to specify explicit python packaging dependencies in setup.py? [duplicate]
                            
                                TypeError: bad operand type for unary -: 'str'
                            
                                What is `scipy.i`?
                            
                                Unexpected behavior for numpy self division
                            
                                Celery tries to connect to the wrong broker
                            
                                Python "with" Keyword in Lambda Functions
                            
                                Dynamically add legends to matplotlib plots in python
                            
                                How To Limit Properties Available On a Python Class

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With