Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Draw / Create Scatterplots of datasets with NaN

I want to draw a scatter plot using pylab, however, some of my data are NaN, like this:

a = [1, 2, 3]
b = [1, 2, None]

pylab.scatter(a,b) doesn't work.

Is there some way that I could draw the points of real value while not displaying these NaN value?

like image 330
yangsuli Avatar asked Apr 02 '13 00:04

yangsuli


People also ask

How do you draw a scatter plot?

Scatter Diagram Procedure Collect pairs of data where a relationship is suspected. Draw a graph with the independent variable on the horizontal axis and the dependent variable on the vertical axis. For each pair of data, put a dot or a symbol where the x-axis value intersects the y-axis value.


2 Answers

Things will work perfectly if you use NaNs. None is not the same thing. A NaN is a float.

As an example:

import numpy as np
import matplotlib.pyplot as plt

plt.scatter([1, 2, 3], [1, 2, np.nan])
plt.show()

enter image description here

Have a look at pandas or numpy masked arrays (and numpy.genfromtxt to load your data) if you want to handle missing data. Masked arrays are built into numpy, but pandas is an extremely useful library, and has very nice missing value functionality.

As an example:

import matplotlib.pyplot as plt
import pandas

x = pandas.Series([1, 2, 3])
y = pandas.Series([1, 2, None])
plt.scatter(x, y)
plt.show()

pandas uses NaNs to represent masked data, while masked arrays use a separate mask array. This means that masked arrays can potentially preserve the original data, while temporarily flagging it as "missing" or "bad". However, they use more memory, and have a hidden gotchas that can be avoided by using NaNs to represent missing data.

As another example, using both masked arrays and NaNs, this time with a line plot:

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 6 * np.pi, 300)
y = np.cos(x)

y1 = np.ma.masked_where(y > 0.7, y)

y2 = y.copy()
y2[y > 0.7] = np.nan

fig, axes = plt.subplots(nrows=3, sharex=True, sharey=True)
for ax, ydata in zip(axes, [y, y1, y2]):
    ax.plot(x, ydata)
    ax.axhline(0.7, color='red')

axes[0].set_title('Original')
axes[1].set_title('Masked Arrays')
axes[2].set_title("Using NaN's")

fig.tight_layout()

plt.show()

enter image description here

like image 184
Joe Kington Avatar answered Oct 01 '22 20:10

Joe Kington


Because you are drawing in 2D space, your points need to be defined by both an X and an Y value. If one of the values is None, that point cannot exist in 2D space so it cannot be plotted, hence you should remove both the None and it's corresponding value from the other list.

There are many ways to accomplish this. Here is one:

a = [1, 2, 3]
b = [1, None, 2]

i = 0
while i < len(a):
    if a[i] == None or b[i] == None:
        a = a[:i] + a[i+1:]
        b = b[:i] + b[i+1:]
    else:
        i += 1

"""Now a = [1, 3] and b = [1, 2]"""

pylab.scatter(a,b)
like image 29
Ionut Hulub Avatar answered Oct 01 '22 21:10

Ionut Hulub