How to change order in the result of pd.crosstab: <pre class="prettyprint"><code>pd.crosstab(df['col1'], df['col2']) </code></pre> I would like to be able to sort by: <ul> <li>unique values of either df['col1'] or df['col2'] (cols/rows of the crosstab result)</li> <li>by marginal values (e.g. showing higher-count values of df['col1'] closer to the top)</li> </ul>

Well, it would be easier to give you a solution if you provided an example of your data, since it can vary a lot accordingly. I will try to build a case scenario and possible solution below. If we take the example data and crosstab: <pre class="prettyprint"><code>a = np.array(['foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'bar', 'bar', 'foo', 'foo', 'foo'], dtype=object) c = np.array(['dull', 'dull', 'shiny', 'dull', 'dull', 'weird', 'shiny', 'dull', 'shiny', 'shiny', 'shiny'], dtype=object) CT = pd.crosstab(a, c, rownames=['a'], colnames=['c']) CT </code></pre> We have the following output: <img src="https://i.stack.imgur.com/3qORq.png" alt="enter image description here"> Thats a regular dataframe object, its just "crosstabed" or better yet "pivottabled" accordingly. You would like to show: <ol> <li>unique values of either df['col1'] or df['col2'] (cols/rows of the crosstab result)</li> <li>by marginal values (e.g. showing higher-count values of df['col1'] closer to the top)</li> </ol> So lets start with "1": There are different ways you can do that, a simple solution would be to show the same dataframe object with boolean values for singular cases; <pre class="prettyprint"><code>[CT == 1] </code></pre> <img src="https://i.stack.imgur.com/c4ftV.png" alt="enter image description here"> However, that format might not be what you desire in case of large dataframes. You could just print the positive cases, or list/append 'em, a simple example would be: <pre class="prettyprint"><code>for col in CT.columns: for index in CT.index: if CT.loc[index,col] == 1: print (index,col,'singular') </code></pre> Output: <pre class="prettyprint"><code>('bar', 'shiny', 'singular') ('bar', 'weird', 'singular') </code></pre> The second item/desire is more complicated. You want to order by higher value. But there might be divergences. A higher value in one column, associated to one set of indexes, will most likely diverge in order from the second column (also associated in the same indexes). Hence, you can choose to order by one specific column: <pre class="prettyprint"><code>CT.sort_values('column_name', ascending=False) </code></pre> Or, you can define a metric by which you want to order (row mean value) and sort accordingly. Hope that helps!

Pandas: change order of crosstab result

Tags:

python

pandas

How to change order in the result of pd.crosstab:

Click to copy

pd.crosstab(df['col1'], df['col2'])

I would like to be able to sort by:

unique values of either df['col1'] or df['col2'] (cols/rows of the crosstab result)
by marginal values (e.g. showing higher-count values of df['col1'] closer to the top)

538

asked Mar 13 '17 17:03

Denis Kulagin

1 Answers

Well, it would be easier to give you a solution if you provided an example of your data, since it can vary a lot accordingly. I will try to build a case scenario and possible solution below.

If we take the example data and crosstab:

Click to copy

a = np.array(['foo', 'foo', 'foo', 'foo', 'bar', 'bar',
       'bar', 'bar', 'foo', 'foo', 'foo'], dtype=object)

c = np.array(['dull', 'dull', 'shiny', 'dull', 'dull', 'weird',
       'shiny', 'dull', 'shiny', 'shiny', 'shiny'], dtype=object)

CT = pd.crosstab(a, c, rownames=['a'], colnames=['c'])

CT

We have the following output:

enter image description here

Thats a regular dataframe object, its just "crosstabed" or better yet "pivottabled" accordingly.

You would like to show:

unique values of either df['col1'] or df['col2'] (cols/rows of the crosstab result)
by marginal values (e.g. showing higher-count values of df['col1'] closer to the top)

So lets start with "1":

There are different ways you can do that, a simple solution would be to show the same dataframe object with boolean values for singular cases;

Click to copy

[CT == 1]

enter image description here

However, that format might not be what you desire in case of large dataframes.

You could just print the positive cases, or list/append 'em, a simple example would be:

Click to copy

for col in CT.columns:

    for index in CT.index:

        if CT.loc[index,col] == 1:

            print (index,col,'singular')

Output:

Click to copy

('bar', 'shiny', 'singular')
('bar', 'weird', 'singular')

The second item/desire is more complicated. You want to order by higher value. But there might be divergences. A higher value in one column, associated to one set of indexes, will most likely diverge in order from the second column (also associated in the same indexes).

Hence, you can choose to order by one specific column:

Click to copy

CT.sort_values('column_name', ascending=False)

Or, you can define a metric by which you want to order (row mean value) and sort accordingly.

Hope that helps!

answered Oct 05 '22 07:10

epattaro

Related questions
                            
                                How do I upgrade packages used by iPython?
                            
                                Adding subtitles to a movie using moviepy
                            
                                vscode python go to symbol not working
                            
                                How to prevent the same task to be executed by celery?
                            
                                /usr/local/bin/python3: bad interpreter: No such file or directory for ubuntu 14.04
                            
                                AttributeError: 'module' object has no attribute 'PROTOCOL_TLSv1_2' with Python 2.7.11
                            
                                Pandas read_csv dtype specify all columns but one
                            
                                Odoo - add custom field attribute?
                            
                                PygraphViz Import Error With PyCharm
                            
                                Large file upload in Flask
                            
                                Training different scikit-learn classifiers on multiple CPUs for each iteration
                            
                                Stopping processes in ThreadPool in Python
                            
                                Detect grid nodes using OpenCV (or using something else)
                            
                                Why not use .values rather than .iat for 6x performance improvement?
                            
                                Python: how to mock a kafka topic for unit tests?
                            
                                Python's shutil.make_archive() creates dot directory on Windows
                            
                                Current-rule's name in Snakemake
                            
                                does not appear to have any patterns in it. If you see valid patterns in the file then the issue is probably caused by a circular import
                            
                                Django CAN find my static files, Pycharm CANNOT resolve them
                            
                                Can type hint in python 3 be used to generate docstring?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: change order of crosstab result

Tags:

python

pandas

Denis Kulagin

People also ask

1 Answers

epattaro

Recent Activity

Donate For Us