Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: change order of crosstab result

Tags:

python

pandas

How to change order in the result of pd.crosstab:

pd.crosstab(df['col1'], df['col2'])

I would like to be able to sort by:

  • unique values of either df['col1'] or df['col2'] (cols/rows of the crosstab result)
  • by marginal values (e.g. showing higher-count values of df['col1'] closer to the top)
like image 538
Denis Kulagin Avatar asked Mar 13 '17 17:03

Denis Kulagin


People also ask

How do I rearrange data in pandas?

reindex() to reorder columns in a DataFrame. Call pandas. DataFrame. reindex(columns=column_names) with a list of the column names in the desired order as column_names to reorder the columns.

How do I change the order of values in a column in pandas?

Change Columns Order Using DataFrame.Use df. reindex(columns=change_column) with a list of columns in the desired order as change_column to reorder the columns.

What is the difference between crosstab and pivot table?

With a basic crosstab, you would have to go back to the program and create a separate crosstab with the information on individual products. Pivot tables let the user filter through their data, add or remove custom fields, and change the appearance of their report.


1 Answers

Well, it would be easier to give you a solution if you provided an example of your data, since it can vary a lot accordingly. I will try to build a case scenario and possible solution below.

If we take the example data and crosstab:

a = np.array(['foo', 'foo', 'foo', 'foo', 'bar', 'bar',
       'bar', 'bar', 'foo', 'foo', 'foo'], dtype=object)

c = np.array(['dull', 'dull', 'shiny', 'dull', 'dull', 'weird',
       'shiny', 'dull', 'shiny', 'shiny', 'shiny'], dtype=object)

CT = pd.crosstab(a, c, rownames=['a'], colnames=['c'])

CT

We have the following output:

enter image description here

Thats a regular dataframe object, its just "crosstabed" or better yet "pivottabled" accordingly.

You would like to show:

  1. unique values of either df['col1'] or df['col2'] (cols/rows of the crosstab result)
  2. by marginal values (e.g. showing higher-count values of df['col1'] closer to the top)

So lets start with "1":

There are different ways you can do that, a simple solution would be to show the same dataframe object with boolean values for singular cases;

[CT == 1]

enter image description here

However, that format might not be what you desire in case of large dataframes.

You could just print the positive cases, or list/append 'em, a simple example would be:

for col in CT.columns:

    for index in CT.index:

        if CT.loc[index,col] == 1:

            print (index,col,'singular')

Output:

('bar', 'shiny', 'singular')
('bar', 'weird', 'singular')

The second item/desire is more complicated. You want to order by higher value. But there might be divergences. A higher value in one column, associated to one set of indexes, will most likely diverge in order from the second column (also associated in the same indexes).

Hence, you can choose to order by one specific column:

CT.sort_values('column_name', ascending=False)

Or, you can define a metric by which you want to order (row mean value) and sort accordingly.

Hope that helps!

like image 96
epattaro Avatar answered Oct 05 '22 07:10

epattaro