I'm looking for an efficient way to check if an array is jagged, where "jagged" means that an element of the array has a different shape from one of it's neighbors in the same dimension.
e.g. [[1, 2], [3, 4, 5]]
or [[1, 2], [3, 4], [5, 6], [[7], [8]]]
Where I'm using list syntax for convenience, but the arguments may be nested lists or nested numpy arrays. I'm also showing integers for convenience, by the lowest-level components could be anything (e.g. generic objects). Let's say the lowest-level objects are not iterable themselves (e.g. str
or dict
, but definitely bonus points for a solution that can handle those too!).
Attemp:
Recursively flattening an array is pretty easy, though I'm guessing quite inefficient, and then the length of the flattened array can be compared to the numpy.size
of the input array. If they match, then it is not jagged.
def really1d(arr):
# Returns false if the given array is not 1D or is a jagged 1D array.
if np.ndim(arr) != 1:
return False
if len(arr) == 0:
return True
if np.any(np.vectorize(np.ndim)(arr)):
return False
return True
def flatten(arr):
# Convert the given array to 1D (even if jagged)
if (not np.iterable(arr)) or really1d(arr):
return arr
return np.concatenate([flatten(aa) for aa in arr])
def isjagged(arr):
if (np.size(arr) == len(flatten(arr))):
return False
return True
I'm pretty sure the concatenations are copying all of the data, which is a complete waste. Maybe there is an itertools
or numpy.flatiter
method of achieving the same goal? Ultimately the flattened array is only being used to find it's length.
Perhaps not the most efficient but it works nicely in numpy. and
will short circuit as soon as one of the conditions is False
. If the first three conditions are True
, we have no choice but to iterate through the rows.
Thankfull, all
will shortcircuit as soon as one of the iterations is False
so it won't check all the rows if it doesn't have to.
def jagged(x):
x = np.asarray(x)
return (
x.dtype == "object"
and x.ndim == 1
and isinstance(x[0], list)
and not all(len(row) == len(x[0]) for row in x)
)
If you wan't to squeeze more efficieny out, it's actually performing the len(x[0])
every iteration of the all
part but this is probably inconsequential and this is a lot more legible than the alaternative which would have you write out the whole if statement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With