I'm new to pandas and working with tabular data in a programming environment. I have sorted a dataframe by a specific column but the answer that panda spits out is not exactly correct.
Here is the code I have used:
league_dataframe.sort_values('overall_league_position')
The result that the sort method yields values in column 'overall league position' are not sorted in ascending or order which is the default for the method.
What am I doing wrong? Thanks for your patience!
For whatever reason, you seem to be working with a column of strings, and sort_values
is returning you a lexsorted result.
Here's an example.
df = pd.DataFrame({"Col": ['1', '2', '3', '10', '20', '19']})
df
Col
0 1
1 2
2 3
3 10
4 20
5 19
df.sort_values('Col')
Col
0 1
3 10
5 19
1 2
4 20
2 3
The remedy is to convert it to numeric, either using .astype
or pd.to_numeric
.
df.Col = df.Col.astype(float)
Or,
df.Col = pd.to_numeric(df.Col, errors='coerce')
df.sort_values('Col')
Col
0 1
1 2
2 3
3 10
5 19
4 20
The only difference b/w astype
and pd.to_numeric
is that the latter is more robust at handling non-numeric strings (they're coerced to NaN
), and will attempt to preserve integers if a coercion to float is not necessary (as is seen in this case).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With