Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to plot scatter graph with markers based on column value [duplicate]

I am trying to plot a scatter graph on some data with grouping. They are grouped by the column group and I want them to have different marker styles based on the group.

Minimal working code

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

colors = ['r','g','b','y']
markers = ['o', '^', 's', 'P']

df = pd.DataFrame()
df["index"] = list(range(100))
df["data"] = np.random.randint(100, size=100)
df["group"] = np.random.randint(4, size=100)
df["color"] = df.apply(lambda x: colors[x["group"]], axis=1)
df["marker"] = df.apply(lambda x: markers[x["group"]], axis=1)

plt.scatter(x=df["index"], y=df["data"], c=df["color"])
# What I thought would have worked
# plt.scatter(x=df["index"], y=df["data"], c=df["color"], marker=df["marker"])
plt.show()

example_output

What I want

I want the groups to have different marker styles as well. For example the red entries will have marker "o" (big dot), green entries with marker "^" (upward triangle) and so on.

What I tried

I thought

plt.scatter(x=df["index"], y=df["data"], c=df["color"], marker=df["marker"])

would have worked but nope...

TypeError: 'Series' objects are mutable, thus they cannot be hashed

I can for loop over the DataFrame and group the entries by their group. Then plot them with the marker argument set with the list defined (like plt.scatter(..., marker=markers[group]). That would result in 4 plt.scatter(...) as there are 4 groups in total. But that is ugly IMO to loop through a DataFrame row by row and I strongly believe there is a better way.

Thanks in advance!

like image 213
Henry Fung Avatar asked Sep 13 '25 02:09

Henry Fung


1 Answers

matplotlib

that is ugly IMO to loop through a DataFrame row by row and I strongly believe there is a better way

With matplotlib, I don't think there is a better way than to loop. Note that if you groupby the markers, it does not loop row by row, just group by group (so 4 times in this case).

This will call plt.scatter 4 times (once per marker):

for marker, d in df.groupby('marker'):
    plt.scatter(x=d['index'], y=d['data'], c=d['color'], marker=marker, label=marker)
plt.legend()


seaborn

As r-beginners commented, sns.scatterplot supports multiple markers via style:

sns.scatterplot(x=df['index'], y=df['data'], c=df['color'], style=df['marker'])

like image 167
tdy Avatar answered Sep 14 '25 15:09

tdy