Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: sort each column individually

My dataframe looks something like this, only much larger.

d = {'Col_1' : pd.Series(['A', 'B']),
 'Col_2' : pd.Series(['B', 'A', 'C']),
 'Col_3' : pd.Series(['B', 'A']),
 'Col_4' : pd.Series(['C', 'A', 'B', 'D']),
 'Col_5' : pd.Series(['A', 'C']),}
df = pd.DataFrame(d)

Col_1  Col_2  Col_3  Col_4  Col_5
  A      B      B      C      A
  B      A      A      A      C
  NaN    C      NaN    B      NaN
  NaN    NaN    NaN    D      NaN

First, I'm trying to sort each column individually. I've tried playing around with something like: df.sort([lambda x: x in df.columns], axis=1, ascending=True, inplace=True) however have only ended up with errors. How do I sort each column individually to end up with something like:

Col_1  Col_2  Col_3  Col_4  Col_5
  A      A      A      A      A
  B      B      B      B      C
  NaN    C      NaN    C      NaN
  NaN    NaN    NaN    D      NaN

Second, I'm looking to concatenate the rows within the columns

 df = pd.concat([df,pd.DataFrame(df.sum(axis=0),columns=['Concatenation']).T])

I can combine everything with the line above after replacing np.nan with '', but the result comes out smashed ('AB') together and would require an additional step to clean (into something like 'A:B').

like image 333
DataSwede Avatar asked Jun 11 '14 19:06

DataSwede


People also ask

Can you sort by multiple columns pandas?

You can sort pandas DataFrame by one or multiple (one or more) columns using sort_values() method and by ascending or descending order. To specify the order, you have to use ascending boolean property; False for descending and True for ascending. By default, it is set to True.

How do I sort only one column in pandas?

Sorting Your DataFrame on a Single Column. To sort the DataFrame based on the values in a single column, you'll use . sort_values() . By default, this will return a new DataFrame sorted in ascending order.

How do I sort columns in pandas?

In order to sort the data frame in pandas, function sort_values() is used. Pandas sort_values() can sort the data frame in Ascending or Descending order.

What does sort_index do in pandas?

Pandas Series: sort_index() function The sort_index() function is used to sort Series by index labels. Returns a new Series sorted by label if inplace argument is False, otherwise updates the original series and returns None.


1 Answers

Here is one way:

>>> pandas.concat([df[col].order().reset_index(drop=True) for col in df], axis=1, ignore_index=True)
11:      0    1    2  3    4
0    A    A    A  A    A
1    B    B    B  B    C
2  NaN    C  NaN  C  NaN
3  NaN  NaN  NaN  D  NaN

[4 rows x 5 columns]

However, what you're doing is somewhat strange. DataFrames aren't just collections of unrelated columns. In a DataFrame, each row represents a record, so the value in one column is semantically linked to the values in other columns in that same row. By sorting the columns independently, you're discarding this information, so the rows are now meaningless. That's why the reset_index is needed in my example. Also, because of this, there's no way to do this in-place, which your example suggests you want.

like image 132
BrenBarn Avatar answered Sep 29 '22 22:09

BrenBarn