I have a square matrix with > 1,000 rows & columns. In many fields at the "border" there is nan
, for example:
grid = [[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, 1, nan, nan],
[nan, 2, 3, 2, nan],
[ 1, 2, 2, 1, nan]]
Now I want to eliminate all rows and columns where I only have nan
. This would be the 1. and 2. row and the last column. But I also want to receive a square matrix, so the number of the eliminated rows must be equal to the number of eliminated columns. In this example, I want to get this:
grid = [[nan, nan, nan, nan],
[nan, nan, 1, nan],
[nan, 2, 3, 2],
[ 1, 2, 2, 1]]
I'm sure I could solve this with a loop: check every column & row if there is only nan
inside and in the end I use numpy.delete to delete the rows & columns I found (but only the minimal number, because of getting a square).
But I hope anyone can help me with a better solution or a good library.
This works, zipping the indices of rows\cols is key so they always have the same length, hence preserving the squareness of the matrix.
nans_in_grid = np.isnan(grid)
nan_rows = np.all(nans_in_grid, axis=0)
nan_cols = np.all(nans_in_grid, axis=1)
indicies_to_remove = zip(np.nonzero(nan_rows)[0], np.nonzero(nan_cols)[0])
y_indice_to_remove, x_indice_to_remove = zip(*indicies_to_remove)
tmp = grid[[x for x in range(grid.shape[0]) if x not in x_indice_to_remove], :]
grid = tmp[:, [y for y in range(grid.shape[1]) if y not in y_indice_to_remove]]
Continuing on Mr E, solution, and then padding the results works also.
def pad_to_square(a, pad_value=np.nan):
m = a.reshape((a.shape[0], -1))
padded = pad_value * np.ones(2 * [max(m.shape)], dtype=m.dtype)
padded[0:m.shape[0], 0:m.shape[1]] = m
return padded
g = np.isnan(grid)
grid = pad_to_square(grid[:, ~np.all(g, axis=0)][~np.all(g, axis=1)])
Another solution, building on the other answer here. Significantly faster for larger matrixes.
shape = grid.shape[0]
first_col = (i for i,col in enumerate(grid.T) if np.isfinite(col).any() == True).next()
last_col = (shape-i-1 for i,col in enumerate(grid.T[::-1]) if np.isfinite(col).any() == True).next()
first_row = (i for i,row in enumerate(grid) if np.isfinite(row).any() == True).next()
last_row = (shape-i-1 for i,row in enumerate(grid[::-1]) if np.isfinite(row).any() == True).next()
row_len = last_row - first_row
col_len = last_col - first_col
delta_len = row_len - col_len
if delta_len == 0:
pass
elif delta_len < 0:
first_row = first_row - abs(delta_len)
if first_row < 0:
delta_len = first_row
first_row = 0
last_row += abs(delta_len)
elif delta_len > 0:
first_col -= abs(delta_len)
if first_col < 0:
delta_len = first_col
first_col = 0
last_col += abs(delta_len)
grid = grid[first_row:last_row+1, first_col:last_col+1]
import numpy as np
nan = np.nan
grid = [[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, 1, nan, nan],
[nan, 2, 3, 2, nan],
[ 1, 2, 2, 1, nan]]
g = np.array(grid)
cols = np.isnan(g).all(axis=0)
rows = np.isnan(g).all(axis=1)
first_col = np.where(cols==False)[0][0]
last_col = len(cols) - np.where(cols[::-1]==False)[0][0] -1
first_row = np.where(rows==False)[0][0]
last_row = len(rows) - np.where(rows[::-1]==False)[0][0] -1
row_len = last_row - first_row
col_len = last_col - first_col
delta_len = row_len - col_len
if delta_len == 0:
pass
elif delta_len < 0:
first_row = first_row - abs(delta_len)
if first_row < 0:
delta_len = first_row
first_row = 0
last_row += abs(delta_len)
elif delta_len > 0:
first_col -= abs(delta_len)
if first_col < 0:
delta_len = first_col
first_col = 0
last_col += abs(delta_len)
print g[first_row:last_row+1, first_col:last_col+1]
Output:
[[ nan nan nan nan]
[ nan nan 1. nan]
[ nan 2. 3. 2.]
[ 1. 2. 2. 1.]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With