I am graphing several columns of a large array of data (through numpy.genfromtxt) against an equally sized time column. Missing data is often referred to as nan, -999, -9999, etc. However I can't figure out how to remove multiple values from the array. This is what I currently have:
for cur_col in range(start_col, total_col):
# Generate what is to be graphed by removing nan values
data_mask = (file_data[:, cur_col] != nan_values)
y_data = file_data[:, cur_col][data_mask]
x_data = file_data[:, time_col][data_mask]
After which point I use matplotlib to create the appropriate figures for each column. This works fine if the nan_values is a single integer, but I am looking to use a list.
EDIT: Here is a working example.
import numpy as np
file_data = np.arange(12.0).reshape((4,3))
file_data[1,1] = np.nan
file_data[2,2] = -999
nan_values = -999
for cur_col in range(1,3):
# Generate what is to be graphed by removing nan values
data_mask = (file_data[:, cur_col] != nan_values)
y_data = file_data[:, cur_col][data_mask]
x_data = file_data[:, 0][data_mask]
print 'y: ' + str(y_data)
print 'x: ' + str(x_data)
print file_data
>>> y: [ 1. nan 7. 10.]
x: [ 0. 3. 6. 9.]
y: [ 2. 5. 11.]
x: [ 0. 3. 9.]
[[ 0. 1. 2.]
[ 3. nan 5.]
[ 6. 7. -999.]
[ 9. 10. 11.]]
This will not work if nan_values = ['nan', -999] which is what I am looking to accomplish.
Create a function for masking. Using masked_where() function: Pass the two array in the function as a parameter then use numpy. ma. masked_where() function in which pass the condition for masking and array to be masked.
To mask an array where the data is exactly equal to value, use the numpy. ma. masked_object() method in Python Numpy. This function is similar to masked_values, but only suitable for object arrays: for floating point, use masked_values instead.
To combine two masks with the logical_or operator, use the mask_or() method in Python Numpy. If copy parameter is False and one of the inputs is nomask, return a view of the other input mask. Defaults to False. The shrink parameter suggests whether to shrink the output to nomask if all its values are False.
To create a boolean mask from an array, use the ma. make_mask() method in Python Numpy. The function can accept any sequence that is convertible to integers, or nomask. Does not require that contents must be 0s and 1s, values of 0 are interpreted as False, everything else as True.
I would suggest using masked arrays like so:
>>> a = np.arange(12.0).reshape((4,3))
>>> a[1,1] = np.nan
>>> a[2,2] = -999
>>> a
array([[ 0., 1., 2.],
[ 3., nan, 5.],
[ 6., 7., -999.],
[ 9., 10., 11.]])
>>> m = np.ma.array(a,mask=(~np.isfinite(a) | (a == -999)))
>>> m
masked_array(data =
[[0.0 1.0 2.0]
[3.0 -- 5.0]
[6.0 7.0 --]
[9.0 10.0 11.0]],
mask =
[[False False False]
[False True False]
[False False True]
[False False False]],
fill_value = 1e+20)
I would try something like (pseudo-code):
nan_values = [...]
for cur_col in range(start_col, total_col):
# Generate what is to be graphed by removing nan values
y_data = [file_data[i,cur_col] for i in range(len(file_data)) if not(file_data[i,cur_col] in nan_values)]
x_data = [file_data[i,time_col] for i in range(len(file_data)) if not(file_data[i,cur_col] in nan_values)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With