I have a simple pandas dataframe that I want to plot with matplotlib: <pre class="prettyprint"><code>import pandas as pd import matplotlib.pyplot as plt df = pd.read_excel('SAT_data.xlsx', index_col = 'State') plt.figure() plt.scatter(df['Year'], df['Reading'], c = 'blue', s = 25) plt.scatter(df['Year'], df['Math'], c = 'orange', s = 25) plt.scatter(df['Year'], df['Writing'], c = 'red', s = 25) </code></pre> Here is what my plot looks like: <img src="https://i.stack.imgur.com/CTuBh.png" alt="Plot of my Data"> I'd like to shift the blue data points a bit to the left, and the red ones a bit to the right, so each year on the x-axis has three mini-columns of scatter data above it instead of all three datasets overlapping. I tried and failed to use the 'verts' argument properly. Is there a better way to do this?

Using an offset transform would allow to shift the scatter points by some amount in units of points instead of data units. The advantage is that they would then always sit tight against each other, independent of the figure size, zoom level etc. <pre class="prettyprint"><code>import matplotlib.pyplot as plt import numpy as np; np.random.seed(0) import matplotlib.transforms as transforms year = np.random.choice(np.arange(2006,2017), size=(300) ) values = np.random.rand(300, 3) plt.figure() offset = lambda p: transforms.ScaledTranslation(p/72.,0, plt.gcf().dpi_scale_trans) trans = plt.gca().transData sc1 = plt.scatter(year, values[:,0], c = 'blue', s = 25, transform=trans+offset(-5)) plt.scatter(year, values[:,1], c = 'orange', s = 25) plt.scatter(year, values[:,2], c = 'red', s = 25, transform=trans+offset(5)) plt.show() </code></pre> Broad figure: <img src="https://i.stack.imgur.com/uFuTG.png" alt="enter image description here"> Normal figure: <img src="https://i.stack.imgur.com/WmTz1.png" alt="enter image description here"> Zoom <img src="https://i.stack.imgur.com/mBU5c.png" alt="enter image description here"> Some explanation: The problem is that we want to add an offset in points to some data in data coordinates. While data coordinates are automatically transformed to display coordinates using the <code>transData</code> (which we normally don't even see on the surface), adding some offset requires us to change the transform. We do this by adding an offset. While we could just add an offset in pixels (display coordinates), it is more convenient to add the offset in points and thereby using the same unit as the size of the scatter points is given in (their size is points squared actually). So we want to know how many pixels are <code>p</code> points? This is found out by dividing <code>p</code> by the ppi (points per inch) to obtain inches, and then by multiplying by the dpi (dots per inch) to obtain the display pixel. This calculation in done in the ScaledTranslation. While the dots per inch are in principle variable (and taken care of by the <code>dpi_scale_trans</code> transform), the points per inch are fixed. Matplotlib uses 72 ppi, which is kind of a typesetting standard.

How do I shift categorical scatter markers to left and right above xticks (multiple data sets per category)?

Tags:

python

pandas

matplotlib

I have a simple pandas dataframe that I want to plot with matplotlib:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_excel('SAT_data.xlsx', index_col = 'State')

plt.figure()
plt.scatter(df['Year'], df['Reading'], c = 'blue', s = 25)
plt.scatter(df['Year'], df['Math'], c = 'orange', s = 25)
plt.scatter(df['Year'], df['Writing'], c = 'red', s = 25)

Here is what my plot looks like:

Plot of my Data

I'd like to shift the blue data points a bit to the left, and the red ones a bit to the right, so each year on the x-axis has three mini-columns of scatter data above it instead of all three datasets overlapping. I tried and failed to use the 'verts' argument properly. Is there a better way to do this?

371

asked Mar 30 '17 18:03

Tara S

1 Answers

Using an offset transform would allow to shift the scatter points by some amount in units of points instead of data units. The advantage is that they would then always sit tight against each other, independent of the figure size, zoom level etc.

import matplotlib.pyplot as plt
import numpy as np; np.random.seed(0)
import matplotlib.transforms as transforms

year = np.random.choice(np.arange(2006,2017), size=(300) ) 
values = np.random.rand(300, 3)

plt.figure()

offset = lambda p: transforms.ScaledTranslation(p/72.,0, plt.gcf().dpi_scale_trans)
trans = plt.gca().transData

sc1 = plt.scatter(year, values[:,0], c = 'blue', s = 25, transform=trans+offset(-5))
plt.scatter(year, values[:,1], c = 'orange', s = 25)
plt.scatter(year, values[:,2], c = 'red', s = 25, transform=trans+offset(5))

plt.show()

Broad figure:
enter image description here
Normal figure:

Zoom

Some explanation:

The problem is that we want to add an offset in points to some data in data coordinates. While data coordinates are automatically transformed to display coordinates using the transData (which we normally don't even see on the surface), adding some offset requires us to change the transform.
We do this by adding an offset. While we could just add an offset in pixels (display coordinates), it is more convenient to add the offset in points and thereby using the same unit as the size of the scatter points is given in (their size is points squared actually). So we want to know how many pixels are p points? This is found out by dividing p by the ppi (points per inch) to obtain inches, and then by multiplying by the dpi (dots per inch) to obtain the display pixel. This calculation in done in the ScaledTranslation. While the dots per inch are in principle variable (and taken care of by the dpi_scale_trans transform), the points per inch are fixed. Matplotlib uses 72 ppi, which is kind of a typesetting standard.

103

answered Sep 21 '22 00:09

ImportanceOfBeingErnest

Related questions
                            
                                Pandas - Change AM/PM format to 24h
                            
                                Selecting the first row of a sorted group from pandas data frame
                            
                                PyLint bad-whitespace Configuration
                            
                                How Can I install Twisted + Scrapy on Python3.6 and CentOs
                            
                                How to pass custom settings through CrawlerProcess in scrapy?
                            
                                APScheduler - ImportError: No module named 'apscheduler'
                            
                                PyQt5 "Timers cannot be started from another thread" error when changing size of QLabel
                            
                                Increase the speed of redrawing contour plot in matplotlib
                            
                                ValueError: Dimensions must be equal, but are 784 and 500 for 'Mul' (op: 'Mul') with input shapes: [?,784], [784,500]
                            
                                Python: urllib.error.HTTPError: HTTP Error 404: Not Found
                            
                                CPU instructions not compiled with TensorFlow
                            
                                Matplotlib Scatter plot with numpy row index as marker
                            
                                How to combine every element of a list to the other list? [duplicate]
                            
                                I want itertools to return a list of lists
                            
                                How to perform a Django test with a request.post?
                            
                                Flask Access-Control-Allow-Origin for multiple URLs
                            
                                Removing rows after a certain string in pandas
                            
                                CMake override policy for subproject
                            
                                hadoop, python, subprocess failed with code 127
                            
                                Dask, create a dataframe from several dask arrays

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With