Creating percentile buckets in pandas

Tags:

python

pandas

I am trying to classify my data in percentile buckets based on their values. My data looks like,

a = pnd.DataFrame(index = ['a','b','c','d','e','f','g','h','i','j'], columns=['data'])
a.data = np.random.randn(10)
print a
print '\nthese are ranked as shown'
print a.rank()

       data
a -0.310188
b -0.191582
c  0.860467
d -0.458017
e  0.858653
f -1.640166
g -1.969908
h  0.649781
i  0.218000
j  1.887577

these are ranked as shown
   data
a     4
b     5
c     9
d     3
e     8
f     2
g     1
h     7
i     6
j    10

To rank this data, I am using the rank function. However, I am interested in the creating a bucket of the top 20%. In the example shown above, this would be a list containing labels ['c', 'j']

desired result : ['c','j']

How do I get the desired result

465

asked Jun 24 '13 23:06

nitin

1 Answers

In [13]: df[df > df.quantile(0.8)].dropna()
Out[13]: 
       data
c  0.860467
j  1.887577

In [14]: list(df[df > df.quantile(0.8)].dropna().index)
Out[14]: ['c', 'j']

143

answered Nov 15 '22 19:11

Dan Allan

Related questions
                            
                                Google App Engine Python Unit Tests
                            
                                How does sympy work? How does it interact with the interactive Python shell, and how does the interactive Python shell work?
                            
                                AttributeError: 'module' object has no attribute (when using cPickle)
                            
                                non-technical benefits of having string-type immutable
                            
                                ValueError: need more than 2 values to unpack in Python 2.6.6
                            
                                wtforms Form class subclassing and field ordering
                            
                                Python Multiprocessing with PyCUDA
                            
                                "Unrolling" a recursive function?
                            
                                Django+Nginx+uWSGI = 504 Gateway Time-out
                            
                                Syntax Highlighting with Pygments is failing via Liquid Templates String Error
                            
                                Python3: What is the difference between keywords and builtins?
                            
                                Convert numpy array to PySide QPixmap
                            
                                How do you install Python Xlib with pip?
                            
                                Efficiently construct Pandas DataFrame from large list of tuples/rows
                            
                                How to transfer a file to ssh server in an ssh-connection made by paramiko?
                            
                                Elegant way to test SSH availability
                            
                                Pandas Drop Rows Outside of Time Range
                            
                                S3 Object Expiration using boto
                            
                                How to convert numpy object array into str/unicode array?
                            
                                What is the difference between cholesky in numpy and scipy?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With