My dataframe round_data
looks like this:
error username task_path
0 0.02 n49vq14uhvy93i5uw33tf7s1ei07vngozrzlsr6q6cnh8w... 39.png
1 0.10 n49vq14uhvy93i5uw33tf7s1ei07vngozrzlsr6q6cnh8w... 45.png
2 0.15 n49vq14uhvy93i5uw33tf7s1ei07vngozrzlsr6q6cnh8w... 44.png
3 0.25 xdoaztndsxoxk3wycpxxkhaiew3lrsou3eafx3em58uqth... 43.png
... ... ... ...
1170 -0.11 9qrz4829q27cu3pskups0vir0ftepql7ynpn6in9hxx3ux... 33.png
1171 0.15 9qrz4829q27cu3pskups0vir0ftepql7ynpn6in9hxx3ux... 34.png
[1198 rows x 3 columns]
I want to have a boxplot showing the error of each user sorted by their average performance. What I have is:
ax = sns.boxplot(
x='username',
y='error',
data=round_data,
whis=np.inf,
color='c',
ax=ax
)
which results into this plot:
How can I sort the x-axis (i.e., users) by mean error?
Seaborn's boxplot() function easily allows us to choose the order of boxplots using the argument “order”. The argument order takes a list ordered in the way we want. Here we manually specify the order of boxes using order as order=[“Professional”,”Less than bachelor's”,”Bachelor's”,”Master's”, 'PhD'].
We can use “order” argument in Seaborn's barplot() function to sort the bars. To the order argument, we need to provide the x-axis variable in the order we want to plot. Here we find the order of the x-axis variable using sort_values() function in Pandas.
To change the position of a legend in a seaborn plot, you can use the plt. legend() command. The default location is “best” – which is where Matplotlib automatically finds a location for the legend based on where it avoids covering any data points.
In seaborn, the hue parameter determines which column in the data frame should be used for colour encoding. Using the official document for lmplot provided an example for this. import seaborn as sns; sns. set(color_codes=True) tips = sns.load_dataset("tips") g = sns.lmplot(x="total_bill", y="tip", data=tips)
I figured out the answer:
grouped = round_data[round_data.batch==i].groupby('username')
users_sorted_average = (
pd.DataFrame({col: vals['absolute_error'] for col, vals in grouped})
.mean()
.sort_values(ascending=True)
)
Passing users_sorted_average
for the "order" parameter in the seaborn plot function would give the desired behavior:
ax = sns.boxplot(
x='username',
y='error',
data=round_data,
whis=np.inf,
ax=ax,
color=c,
order=users_sorted_average.index,
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With