I have a Python Pandas Dataframe, in which a column named status
contains three kinds of possible values: ok
, must read x more books
, does not read any books yet
, where x
is an integer higher than 0
.
I want to sort status
values according to the order above.
Example:
name status
0 Paul ok
1 Jean must read 1 more books
2 Robert must read 2 more books
3 John does not read any book yet
I've found some interesting hints, using Pandas Categorical and map but I don't know how to deal with variable values modifying strings.
How can I achieve that?
Use:
a = df['status'].str.extract('(\d+)', expand=False).astype(float)
d = {'ok': a.max() + 1, 'does not read any book yet':-1}
df1 = df.iloc[(-df['status'].map(d).fillna(a)).argsort()]
print (df1)
name status
0 Paul ok
2 Robert must read 2 more books
1 Jean must read 1 more books
3 John does not read any book yet
Explanation:
extract
integers by regex
\d+
dictionary
for map
non numeric valuesNaN
s by fillna
for numeric Series
iloc
for sorted valuesYou can use sorted
with a custom function to calculate the indices which would be sort an array (much like numpy.argsort
). Then feed to pd.DataFrame.iloc
:
df = pd.DataFrame({'name': ['Paul', 'Jean', 'Robert', 'John'],
'status': ['ok', 'must read 20 more books',
'must read 3 more books', 'does not read any book yet']})
def sort_key(x):
if x[1] == 'ok':
return -1
elif x[1] == 'does not read any book yet':
return np.inf
else:
return int(x[1].split()[2])
idx = [idx for idx, _ in sorted(enumerate(df['status']), key=sort_key)]
df = df.iloc[idx, :]
print(df)
name status
0 Paul ok
2 Robert must read 3 more books
1 Jean must read 20 more books
3 John does not read any book yet
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With