Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating percentile buckets in pandas

Tags:

python

pandas

I am trying to classify my data in percentile buckets based on their values. My data looks like,

a = pnd.DataFrame(index = ['a','b','c','d','e','f','g','h','i','j'], columns=['data'])
a.data = np.random.randn(10)
print a
print '\nthese are ranked as shown'
print a.rank()

       data
a -0.310188
b -0.191582
c  0.860467
d -0.458017
e  0.858653
f -1.640166
g -1.969908
h  0.649781
i  0.218000
j  1.887577

these are ranked as shown
   data
a     4
b     5
c     9
d     3
e     8
f     2
g     1
h     7
i     6
j    10

To rank this data, I am using the rank function. However, I am interested in the creating a bucket of the top 20%. In the example shown above, this would be a list containing labels ['c', 'j']

desired result : ['c','j']

How do I get the desired result

like image 465
nitin Avatar asked Jun 24 '13 23:06

nitin


People also ask

How do you split data into bins in Python?

Use pd. cut() for binning data based on the range of possible values. Use pd. qcut() for binning data based on the actual distribution of values.

What is QCUT in pandas?

qcut() functionDiscretize variable into equal-sized buckets based on rank or based on sample quantiles. For example 1000 values for 10 quantiles would produce a Categorical object indicating quantile membership for each data point.


1 Answers

In [13]: df[df > df.quantile(0.8)].dropna()
Out[13]: 
       data
c  0.860467
j  1.887577

In [14]: list(df[df > df.quantile(0.8)].dropna().index)
Out[14]: ['c', 'j']
like image 143
Dan Allan Avatar answered Nov 15 '22 19:11

Dan Allan