My dataframe looks something like this, only much larger.
d = {'Col_1' : pd.Series(['A', 'B']),
'Col_2' : pd.Series(['B', 'A', 'C']),
'Col_3' : pd.Series(['B', 'A']),
'Col_4' : pd.Series(['C', 'A', 'B', 'D']),
'Col_5' : pd.Series(['A', 'C']),}
df = pd.DataFrame(d)
Col_1 Col_2 Col_3 Col_4 Col_5
A B B C A
B A A A C
NaN C NaN B NaN
NaN NaN NaN D NaN
First, I'm trying to sort each column individually. I've tried playing around with something like: df.sort([lambda x: x in df.columns], axis=1, ascending=True, inplace=True)
however have only ended up with errors. How do I sort each column individually to end up with something like:
Col_1 Col_2 Col_3 Col_4 Col_5
A A A A A
B B B B C
NaN C NaN C NaN
NaN NaN NaN D NaN
Second, I'm looking to concatenate the rows within the columns
df = pd.concat([df,pd.DataFrame(df.sum(axis=0),columns=['Concatenation']).T])
I can combine everything with the line above after replacing np.nan with '', but the result comes out smashed ('AB') together and would require an additional step to clean (into something like 'A:B').
You can sort pandas DataFrame by one or multiple (one or more) columns using sort_values() method and by ascending or descending order. To specify the order, you have to use ascending boolean property; False for descending and True for ascending. By default, it is set to True.
Sorting Your DataFrame on a Single Column. To sort the DataFrame based on the values in a single column, you'll use . sort_values() . By default, this will return a new DataFrame sorted in ascending order.
In order to sort the data frame in pandas, function sort_values() is used. Pandas sort_values() can sort the data frame in Ascending or Descending order.
Pandas Series: sort_index() function The sort_index() function is used to sort Series by index labels. Returns a new Series sorted by label if inplace argument is False, otherwise updates the original series and returns None.
Here is one way:
>>> pandas.concat([df[col].order().reset_index(drop=True) for col in df], axis=1, ignore_index=True)
11: 0 1 2 3 4
0 A A A A A
1 B B B B C
2 NaN C NaN C NaN
3 NaN NaN NaN D NaN
[4 rows x 5 columns]
However, what you're doing is somewhat strange. DataFrames aren't just collections of unrelated columns. In a DataFrame, each row represents a record, so the value in one column is semantically linked to the values in other columns in that same row. By sorting the columns independently, you're discarding this information, so the rows are now meaningless. That's why the reset_index
is needed in my example. Also, because of this, there's no way to do this in-place, which your example suggests you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With