For the pyplot.scatter(x,y,s,c....) function ,
The matplotlib docs states that :
c : color, sequence, or sequence of color, optional, default: 'b' The marker color. Possible values:
A single color format string. A sequence of color specifications of length n. A sequence of n numbers to be mapped to colors using cmap and norm. A 2-D array in which the rows are RGB or RGBA. Note that c should not be a single numeric RGB or RGBA sequence because that is indistinguishable from an array of values to be colormapped. If you want to specify the same RGB or RGBA value for all points, use a 2-D array with a single row.
However i do not understand how i can change the colors of the datapoints as i wish .
I have this piece of code :
import matplotlib.pyplot as plt
import numpy as np
import sklearn
import sklearn.datasets
import sklearn.linear_model
import matplotlib
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (13.0, 9.0)
# Generate a dataset and plot it
np.random.seed(0)
X, y = sklearn.datasets.make_moons(200, noise=0.55)
print(y)
plt.scatter(X[:,0], X[:,1], c=y)#, cmap=plt.cm.Spectral)
the output plot
How can i change the colours to suppose black and green datapoints if i wish ? or something else ? Also please explain what exactly cmap does .
Why my plots are magenta and blue every time i use plt.cm.Spectral ?
There are essentially two option on how to colorize scatter points.
You may externally map values to color and supply a list/array of those colors to the scatter
's c
argument.
z = np.array([1,0,1,0,1])
colors = np.array(["black", "green"])
plt.scatter(x,y, c=colors[z])
Apart from explicit colors, one can also supply a list/array of values which should be mapped to colors according to a normalization and a colormap.
colormap
is a callable that takes float values between 0.
and 1.
as input and returns a RGB color.Normalize
would provide a linear mapping of values between vmin
and vmax
to the range between 0.
and 1.
.The natural way to obtain a color from some data is hence to chain the two,
cmap = plt.cm.Spectral
norm = plt.Normalize(vmin=4, vmax=5)
z = np.array([4,4,5,4,5])
plt.scatter(x,y, c = cmap(norm(z)))
Here the value of 4
would be mapped to 0
by the normalzation, and the value of 5
be mapped to 1
, such that the colormap provides the two outmost colors.
This process happens internally in scatter
if an array of numeric values is provided to c
.
A scatter
creates a PathCollection
, which subclasses ScalarMappable
. A ScalarMappable
consists of a colormap, a normalization and an array of values. Hence the above is internalized via
plt.scatter(x,y, c=z, norm=norm, cmap=cmap)
If the minimum and maximum data are to be used as limits for the normalization, you may leave that argument out.
plt.scatter(x,y, c=z, cmap=cmap)
This is the reason that the output in the question will always be purple and yellow dots, independent of the values provided to c
.
Coming back to the requirement of mapping an array of 0
and 1
to black and green color you may now look at the colormaps provided by matplotlib and look for a colormap which comprises black and green. E.g. the nipy_spectral
colormap
Here black is at the start of the colormap and green somewhere in the middle, say at 0.5
. One would hence need to set vmin
to 0, and vmax
, such that vmax*0.5 = 1
(with 1
the value to be mapped to green), i.e. vmax = 1./0.5 == 2
.
import matplotlib.pyplot as plt
import numpy as np
x,y = np.random.rand(2,6)
z = np.array([0,0,1,1,0,1])
plt.scatter(x,y, c = z,
norm = plt.Normalize(vmin=0, vmax=2),
cmap = "nipy_spectral")
plt.show()
Since there may not always be a colormap with the desired colors available and since it may not be straight forward to obtain the color positions from existing colormaps, an alternative is to create a new colormaps specifically for the desired purpose.
Here we might simply create a colormap of two colors black and green.
matplotlib.colors.ListedColormap(["black", "green"])
We would not need any normalization here, because we only have two values and can hence rely on automatic normalization.
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import numpy as np
x,y = np.random.rand(2,6)
z = np.array([0,0,1,1,0,1])
plt.scatter(x,y, c = z, cmap = mcolors.ListedColormap(["black", "green"]))
plt.show()
First, to set the colors according to the values in y
, you can do this:
color = ['red' if i==0 else 'green' for i in y]
plt.scatter(X[:,0], X[:,1], c=color)
Now talking about scatter()
and cmap
.
ColorMaps are used to provide colors from float values. See this documentation for reference on colormaps.
For values between 0 to 1, a color is chosen from these colormaps.
For example:
plt.cm.Spectral(0.0)
# (0.6196078431372549, 0.00392156862745098, 0.25882352941176473, 1.0) #<== magenta
plt.cm.Spectral(1.0)
# (0.3686274509803922, 0.30980392156862746, 0.6352941176470588, 1.0) #<== blue
plt.cm.Spectral(1)
# (0.6280661284121491, 0.013302575932333718, 0.26082276047673975, 1.0)
Note that the results of 1.0 and 1 are different in above code, because the int and floats are handled differently as mentioned in documentation of __call__()
here:
For floats, X should be in the interval
[0.0, 1.0]
to return the RGBA valuesX*100
percent along the Colormap line.For integers, X should be in the interval
[0, Colormap.N)
to return RGBA values indexed from the Colormap with indexX
.
Please look at this answer for more better explanation about colormaps:-
In your y, you have 0 and 1, so the RGBA values shown in above code are used (which are representing two ends of the Spectral colormap).
Now here's how c
and cmap
parameters in plt.scatter()
interact with each other.
_______________________________________________________________________
|No | type of x, y | c type | values in c | result |
|___|______________|__________|_____________|___________________________|
|1 | single | scalar | numbers | cmap(0.0), no matter |
| | point | | | what the value in c |
|___|______________|__________|_____________|___________________________|
|2 | array of | array | numbers | normalize the values in c,|
| | points | | | cmap(normalized val in c) |
|___|______________|__________|_____________|___________________________|
|3 | scalar or | scalar or| RGBA Values,| no use of cmap, |
| | array | array |Color Strings| use colors from c |
|___|______________|__________|_____________|___________________________|
Now once the actual colors are finalized, then cycles through the colors for each point in x, y
. If the size of x, y is equal to or less than size of colors in c, then you get perfect mapping, or else olders colors are used again.
Here's an example to illustrate this:
# Case 1 from above table
# All three points get the same color = plt.cm.Spectral(0)
plt.scatter(x=0.0, y=0.2, c=0, cmap=plt.cm.Spectral)
plt.scatter(x=0.0, y=0.3, c=1, cmap=plt.cm.Spectral)
plt.scatter(x=0.0, y=0.4, c=1.0, cmap=plt.cm.Spectral)
# Case 2 from above table
# The values in c are normalized
# highest value in c gets plt.cm.Spectral(1.0)
# lowest value in c gets plt.cm.Spectral(0.0)
# Others in between as per normalizing
# Size of arrays in x, y, and c must match here, else error is thrown
plt.scatter([0.1, 0.1, 0.1, 0.1, 0.1], [0.2, 0.3, 0.4, 0.5, 0.6],
c=[1, 2, 3, 4, 5], cmap=plt.cm.Spectral)
# Case 3 from above table => No use of cmap here,
# blue is assigned to the point
plt.scatter(x=0.2, y=0.3, c='b')
# You can also provide rgba tuple
plt.scatter(x=0.2, y=0.4, c=plt.cm.Spectral(0.0))
# Since a single point is present, the first color (green) is given
plt.scatter(x=0.2, y=0.5, c=['g', 'r'])
# Same color 'cyan' is assigned to all values
plt.scatter([0.3, 0.3, 0.3, 0.3, 0.3], [0.2, 0.3, 0.4, 0.5, 0.6],
c='c')
# Colors are cycled through points
# 4th point will get again first color
plt.scatter([0.4, 0.4, 0.4, 0.4, 0.4], [0.2, 0.3, 0.4, 0.5, 0.6],
c=['m', 'y', 'k'])
# Same way for rgba values
# Third point will get first color again
plt.scatter([0.5, 0.5, 0.5, 0.5, 0.5], [0.2, 0.3, 0.4, 0.5, 0.6],
c=[plt.cm.Spectral(0.0), plt.cm.Spectral(1.0)])
Output:
Go through the comments in the code and location of points along with the colors to understand thoroughly.
You can also replace the param c
with color
in the code of Case 3 and the results will still be same.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With