I want to draw a scatter plot using pylab, however, some of my data are NaN
, like this:
a = [1, 2, 3]
b = [1, 2, None]
pylab.scatter(a,b)
doesn't work.
Is there some way that I could draw the points of real value while not displaying these NaN
value?
Scatter Diagram Procedure Collect pairs of data where a relationship is suspected. Draw a graph with the independent variable on the horizontal axis and the dependent variable on the vertical axis. For each pair of data, put a dot or a symbol where the x-axis value intersects the y-axis value.
Things will work perfectly if you use NaN
s. None
is not the same thing. A NaN
is a float.
As an example:
import numpy as np
import matplotlib.pyplot as plt
plt.scatter([1, 2, 3], [1, 2, np.nan])
plt.show()
Have a look at pandas
or numpy masked arrays (and numpy.genfromtxt
to load your data) if you want to handle missing data. Masked arrays are built into numpy, but pandas
is an extremely useful library, and has very nice missing value functionality.
As an example:
import matplotlib.pyplot as plt
import pandas
x = pandas.Series([1, 2, 3])
y = pandas.Series([1, 2, None])
plt.scatter(x, y)
plt.show()
pandas
uses NaN
s to represent masked data, while masked arrays use a separate mask array. This means that masked arrays can potentially preserve the original data, while temporarily flagging it as "missing" or "bad". However, they use more memory, and have a hidden gotchas that can be avoided by using NaN
s to represent missing data.
As another example, using both masked arrays and NaN
s, this time with a line plot:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 6 * np.pi, 300)
y = np.cos(x)
y1 = np.ma.masked_where(y > 0.7, y)
y2 = y.copy()
y2[y > 0.7] = np.nan
fig, axes = plt.subplots(nrows=3, sharex=True, sharey=True)
for ax, ydata in zip(axes, [y, y1, y2]):
ax.plot(x, ydata)
ax.axhline(0.7, color='red')
axes[0].set_title('Original')
axes[1].set_title('Masked Arrays')
axes[2].set_title("Using NaN's")
fig.tight_layout()
plt.show()
Because you are drawing in 2D space, your points need to be defined by both an X and an Y value. If one of the values is None, that point cannot exist in 2D space so it cannot be plotted, hence you should remove both the None and it's corresponding value from the other list.
There are many ways to accomplish this. Here is one:
a = [1, 2, 3]
b = [1, None, 2]
i = 0
while i < len(a):
if a[i] == None or b[i] == None:
a = a[:i] + a[i+1:]
b = b[:i] + b[i+1:]
else:
i += 1
"""Now a = [1, 3] and b = [1, 2]"""
pylab.scatter(a,b)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With