Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently check if an array is jagged

I'm looking for an efficient way to check if an array is jagged, where "jagged" means that an element of the array has a different shape from one of it's neighbors in the same dimension.

e.g. [[1, 2], [3, 4, 5]] or [[1, 2], [3, 4], [5, 6], [[7], [8]]]

Where I'm using list syntax for convenience, but the arguments may be nested lists or nested numpy arrays. I'm also showing integers for convenience, by the lowest-level components could be anything (e.g. generic objects). Let's say the lowest-level objects are not iterable themselves (e.g. str or dict, but definitely bonus points for a solution that can handle those too!).

Attemp:

Recursively flattening an array is pretty easy, though I'm guessing quite inefficient, and then the length of the flattened array can be compared to the numpy.size of the input array. If they match, then it is not jagged.

def really1d(arr):
    # Returns false if the given array is not 1D or is a jagged 1D array.
    if np.ndim(arr) != 1:
        return False
    if len(arr) == 0:
        return True
    if np.any(np.vectorize(np.ndim)(arr)):
        return False
    return True


def flatten(arr):
    # Convert the given array to 1D (even if jagged)
    if (not np.iterable(arr)) or really1d(arr):
        return arr
    return np.concatenate([flatten(aa) for aa in arr])


def isjagged(arr):
    if (np.size(arr) == len(flatten(arr))):
        return False
    return True

I'm pretty sure the concatenations are copying all of the data, which is a complete waste. Maybe there is an itertools or numpy.flatiter method of achieving the same goal? Ultimately the flattened array is only being used to find it's length.

like image 936
DilithiumMatrix Avatar asked Nov 27 '22 13:11

DilithiumMatrix


1 Answers

Perhaps not the most efficient but it works nicely in numpy. and will short circuit as soon as one of the conditions is False. If the first three conditions are True, we have no choice but to iterate through the rows.

Thankfull, all will shortcircuit as soon as one of the iterations is False so it won't check all the rows if it doesn't have to.

def jagged(x):
    x = np.asarray(x)
    return (
        x.dtype == "object"
        and x.ndim == 1
        and isinstance(x[0], list)
        and not all(len(row) == len(x[0]) for row in x) 
    )

If you wan't to squeeze more efficieny out, it's actually performing the len(x[0]) every iteration of the all part but this is probably inconsequential and this is a lot more legible than the alaternative which would have you write out the whole if statement.

like image 77
Eddie Bergman Avatar answered Nov 29 '22 03:11

Eddie Bergman