Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python scatter plot from Pandas dataframe with many columns

I have a data frame that looks like the following: Dataframe Snapshot

I would like to make a scatter plot with JUST POINTS on the graph and I want all the points to line up in columns on the graph where each columns is a month (Jan, Feb, Mar, etc) on the y axis. The actual data points will be plotted on the y-axis.

When I do

df.plot.scatter()

it of course wants me to declare an x and y value. I can't really do this if you look at the dataframe picture I attached. How can I plot where all the points for each month are lined up vertically on the x-axis above each month label? I have also tried:

df.plot.box

This basically gives me what I want, but I only want the points and not the box/whiskers it also attempts to plot. I just want points.

like image 558
JMP0629 Avatar asked Feb 25 '26 12:02

JMP0629


1 Answers

I don't believe that you will be able to use pandas to plot a scatter plot with a categorical variable. You could assign a numeric value to each month that you are trying to plot, although you could also just use matplotlib

Create a test data set:

data = np.random.randn(4, 3)
df = pd.DataFrame(data, columns=['Jan', 'Feb', 'Mar'])

Convert this to long form:

df = df.melt()

When you plot you need to specify the x location of each category. I use enumerate, although you could create a new column with numeric values as well

groups = df.groupby('variable')
fig, ax = plt.subplots()
x_ticks = []
x_ticklabels = []
for i, (name, group) in enumerate(groups):
    y = group.value
    x = [i]*len(y)
    ax.scatter(x, y)
    x_ticks.append(i)
    x_ticklabels.append(name)

Then you can set your tick labels to be match your x-values:

ax.set_xticks(x_ticks)   
ax.set_xticklabels(x_ticklabels);

enter image description here

Update I like to deal with things in long form as each entry becomes a single observation, however I realize it would be more concise to loop through the columns without transforming the data:

fig, ax = plt.subplots()
for i, (name, value) in enumerate(df.iteritems()):
    ax.scatter([i]*len(value), value)
ax.set_xticks(range(len(df.columns)))
ax.set_xticklabels(df.columns);
like image 171
johnchase Avatar answered Feb 28 '26 07:02

johnchase



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!