pandas scatter plot colors with three points and seaborn

Tags:

There is a strange behavior when using pandas and seaborn to plot a scatter plot that has only three points: the points don't have the same color. The problem disappears when seaborn is not loaded or when there are more than three points, or when plotting with matplotlib's scatter method directly. See the following example:

from pandas import DataFrame #0.16.0
import matplotlib.pyplot as plt #1.4.3
import seaborn as sns #0.5.1
import numpy as np #1.9.2

df = DataFrame({'x': np.random.uniform(0, 1, 3), 'y': np.random.uniform(0, 1, 3)})
df.plot(kind = 'scatter', x = 'x', y = 'y')
plt.show()

df = DataFrame({'x': np.random.uniform(0, 1, 4), 'y': np.random.uniform(0, 1, 4)})
df.plot(kind = 'scatter', x = 'x', y = 'y')
plt.show()

524

asked Mar 25 '15 10:03

David Brochart

1 Answers

I've tracked down the bug. The bug is in pandas technically, not seaborn as I originally thought, though it involves code from pandas, seaborn, and matplotlib...

In pandas.tools.plotting.ScatterPlot._make_plot the following code occurs to choose the colours to be used in the scatter plot

if c is None:
    c_values = self.plt.rcParams['patch.facecolor']
elif c_is_column:
    c_values = self.data[c].values
else:
    c_values = c

In your case c will be equal to None, which is the default value, and so c_values will be given by plt.rcParams['patch.facecolor'].

Now, as part of setting itself up, seaborn modifies plt.rcParams['patch.facecolor'] to (0.5725490196078431, 0.7764705882352941, 1.0) which is an RGB tuple. If seaborn is not used then the value is the matplotlib default which is 'b' (a string indicating the colour "blue").

c_values is then used later on to actually plot the graph within ax.scatter

scatter = ax.scatter(data[x].values, data[y].values, c=c_values,
                     label=label, cmap=cmap, **self.kwds)

The issue arises because the keyword argument c can accept multiple different types of argument, it can accept:-

a string (such as 'b' in the original matplotlib case);
a sequence of color specifications (say a sequence of RGB values);
a sequence of values to map onto the current colormap.

The matplotlib docs specifically state the following, highlighting mine

c can be a single color format string, or a sequence of color specifications of length N, or a sequence of N numbers to be mapped to colors using the cmap and norm specified via kwargs (see below). Note that c should not be a single numeric RGB or RGBA sequence because that is indistinguishable from an array of values to be colormapped. c can be a 2-D array in which the rows are RGB or RGBA, however.

What basically happens is that matplotlib takes the c_values value (which is a tuple of three numbers) and then maps those colours onto the current colormap (which is set by pandas to be Greys by default). As such, you get three scatter points with different "greyishness". When you have more than 3 scatter points, matplotlib assumes that it must be a RGB tuple because the length doesn't match the length of the data arrays (3 != 4) and so uses it as a constant RBG colour.

This has been written up as a bug report on the pandas Github here.

118

answered Oct 29 '22 17:10

Ffisegydd

Related questions
                            
                                How to give exec code meaningful location to show if exception?
                            
                                Returning and printing without assigning to variable?
                            
                                delete every nth row or column in a matrix using Python
                            
                                create a meshgrid for polar coordinates
                            
                                Cython: How to wrap a C++ function that returns a C++ object?
                            
                                SQLAlchemy update parent when related child changes
                            
                                Concurrent atomic select-update
                            
                                Data-Binding in Python?
                            
                                How to pass a nested dictionary to Flask's GET request handler
                            
                                Flask foreign_keys still shows AmbiguousForeignKeysError
                            
                                How to install python modules in a local directory? --user and exporting pythonpath isn't working
                            
                                how to use a terminal embedded in a PyQt GUI
                            
                                Update OrderedDict's value at position?
                            
                                How to pre-process new instances for classification, so that the feature encoding is the same as the model with Scikit-learn?
                            
                                Mark the shortest overlapping match using regular expressions
                            
                                How can I extract this obvious event from this image?
                            
                                Find middle of a list [duplicate]
                            
                                Count verbs, nouns, and other parts of speech with python's NLTK
                            
                                How can I use bcrypt/scrypt on appengine for Python?
                            
                                How to extract and download all images from a website using beautifulSoup?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas scatter plot colors with three points and seaborn

Tags:

python

pandas

seaborn

David Brochart

People also ask

1 Answers

Ffisegydd

Recent Activity

Donate For Us