Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

displaying the percentile distribution as a dataframe in python

I am trying to display the output of percentile distribution for each column as a dataframe as I want to export it to csv later.

I have simply looped all the columns like this :

for column in data:
    print(data[column].describe([.01,.1,.2,.3,.4,.5,.6,.7,.8,.9,.99]))

However, I couldn't figure out how to do rest. Any help is hugely appreciated!

+Editing the main question with an additional query :

I also want to group my output by a column such as data.groupby(data['MARKET']).describe([.01,.1,.2,.3,.4,.5,.6‌​,.7,.8,.9,.99]) However, I receive the error like "describe() takes 1 positional argument but 2 were given". How can I deal with this problem?

Sample data set:

d = {'col1': [1, 2, 3, 2, 1],
     'col2': [3, 4, 5, 6, 7], 
     'country': ['TR', 'UK', 'UK' , 'TR', 'TR']};

df = pd.DataFrame(data=d)
like image 783
Cagdas Kanar Avatar asked Nov 27 '17 18:11

Cagdas Kanar


1 Answers

Is that what you want?

In [19]: df = pd.DataFrame(np.arange(15).reshape(5,3)).add_prefix('col')

In [20]: df
Out[20]:
   col0  col1  col2
0     0     1     2
1     3     4     5
2     6     7     8
3     9    10    11
4    12    13    14

In [21]: df.describe([.01,.1,.2,.3,.4,.5,.6,.7,.8,.9,.99])
Out[21]:
            col0       col1       col2
count   5.000000   5.000000   5.000000
mean    6.000000   7.000000   8.000000
std     4.743416   4.743416   4.743416
min     0.000000   1.000000   2.000000
1%      0.120000   1.120000   2.120000
10%     1.200000   2.200000   3.200000
20%     2.400000   3.400000   4.400000
30%     3.600000   4.600000   5.600000
40%     4.800000   5.800000   6.800000
50%     6.000000   7.000000   8.000000
60%     7.200000   8.200000   9.200000
70%     8.400000   9.400000  10.400000
80%     9.600000  10.600000  11.600000
90%    10.800000  11.800000  12.800000
99%    11.880000  12.880000  13.880000
max    12.000000  13.000000  14.000000

UPDATE:

d = {'col1': [1, 2, 3, 2, 1], 'col2': [3, 4, 5, 6, 7], 'country': ['TR', 'UK', 'UK' , 'TR', 'TR']};

df = pd.DataFrame(data=d)

In [29]: df.groupby('country').apply(lambda x: x.describe([.01,.1,.2,.3,.4,.5,.6,.7,.8,.9,.99]))
Out[29]:
                   col1      col2
country
TR      count  3.000000  3.000000
        mean   1.333333  5.333333
        std    0.577350  2.081666
        min    1.000000  3.000000
        1%     1.000000  3.060000
        10%    1.000000  3.600000
        20%    1.000000  4.200000
        30%    1.000000  4.800000
        40%    1.000000  5.400000
        50%    1.000000  6.000000
        60%    1.200000  6.200000
        70%    1.400000  6.400000
        80%    1.600000  6.600000
        90%    1.800000  6.800000
        99%    1.980000  6.980000
        max    2.000000  7.000000
UK      count  2.000000  2.000000
        mean   2.500000  4.500000
        std    0.707107  0.707107
        min    2.000000  4.000000
        1%     2.010000  4.010000
        10%    2.100000  4.100000
        20%    2.200000  4.200000
        30%    2.300000  4.300000
        40%    2.400000  4.400000
        50%    2.500000  4.500000
        60%    2.600000  4.600000
        70%    2.700000  4.700000
        80%    2.800000  4.800000
        90%    2.900000  4.900000
        99%    2.990000  4.990000
        max    3.000000  5.000000
like image 185
MaxU - stop WAR against UA Avatar answered Nov 09 '22 17:11

MaxU - stop WAR against UA