Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I shift categorical scatter markers to left and right above xticks (multiple data sets per category)?

I have a simple pandas dataframe that I want to plot with matplotlib:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_excel('SAT_data.xlsx', index_col = 'State')

plt.figure()
plt.scatter(df['Year'], df['Reading'], c = 'blue', s = 25)
plt.scatter(df['Year'], df['Math'], c = 'orange', s = 25)
plt.scatter(df['Year'], df['Writing'], c = 'red', s = 25)

Here is what my plot looks like:

Plot of my Data

I'd like to shift the blue data points a bit to the left, and the red ones a bit to the right, so each year on the x-axis has three mini-columns of scatter data above it instead of all three datasets overlapping. I tried and failed to use the 'verts' argument properly. Is there a better way to do this?

like image 371
Tara S Avatar asked Mar 30 '17 18:03

Tara S


People also ask

Can you do a scatterplot with categorical data?

A scatterplot with groups can be used to display the relationship between two quantitative variables and one categorical variable.

How do you change the shape of dots on a scatter plot in Python?

Just use the marker argument of the plot() function to custom the shape of the data points.


1 Answers

Using an offset transform would allow to shift the scatter points by some amount in units of points instead of data units. The advantage is that they would then always sit tight against each other, independent of the figure size, zoom level etc.

import matplotlib.pyplot as plt
import numpy as np; np.random.seed(0)
import matplotlib.transforms as transforms

year = np.random.choice(np.arange(2006,2017), size=(300) ) 
values = np.random.rand(300, 3)

plt.figure()

offset = lambda p: transforms.ScaledTranslation(p/72.,0, plt.gcf().dpi_scale_trans)
trans = plt.gca().transData

sc1 = plt.scatter(year, values[:,0], c = 'blue', s = 25, transform=trans+offset(-5))
plt.scatter(year, values[:,1], c = 'orange', s = 25)
plt.scatter(year, values[:,2], c = 'red', s = 25, transform=trans+offset(5))

plt.show()

Broad figure:
enter image description here
Normal figure:
enter image description here
Zoom
enter image description here

Some explanation:

The problem is that we want to add an offset in points to some data in data coordinates. While data coordinates are automatically transformed to display coordinates using the transData (which we normally don't even see on the surface), adding some offset requires us to change the transform.
We do this by adding an offset. While we could just add an offset in pixels (display coordinates), it is more convenient to add the offset in points and thereby using the same unit as the size of the scatter points is given in (their size is points squared actually). So we want to know how many pixels are p points? This is found out by dividing p by the ppi (points per inch) to obtain inches, and then by multiplying by the dpi (dots per inch) to obtain the display pixel. This calculation in done in the ScaledTranslation. While the dots per inch are in principle variable (and taken care of by the dpi_scale_trans transform), the points per inch are fixed. Matplotlib uses 72 ppi, which is kind of a typesetting standard.

like image 103
ImportanceOfBeingErnest Avatar answered Sep 21 '22 00:09

ImportanceOfBeingErnest