Prevent numpy from creating a multidimensional array

Tags:

NumPy is really helpful when creating arrays. If the first argument for numpy.array has a __getitem__ and __len__ method these are used on the basis that it might be a valid sequence.

Unfortunatly I want to create an array containing dtype=object without NumPy being "helpful".

Broken down to a minimal example the class would like this:

import numpy as np

class Test(object):
    def __init__(self, iterable):
        self.data = iterable

    def __getitem__(self, idx):
        return self.data[idx]

    def __len__(self):
        return len(self.data)

    def __repr__(self):
        return '{}({})'.format(self.__class__.__name__, self.data)

and if the "iterables" have different lengths everything is fine and I get exactly the result I want to have:

>>> np.array([Test([1,2,3]), Test([3,2])], dtype=object)
array([Test([1, 2, 3]), Test([3, 2])], dtype=object)

but NumPy creates a multidimensional array if these happen to have the same length:

>>> np.array([Test([1,2,3]), Test([3,2,1])], dtype=object)
array([[1, 2, 3],
       [3, 2, 1]], dtype=object)

Unfortunatly there is only a ndmin argument so I was wondering if there is a way to enforce a ndmax or somehow prevent NumPy from interpreting the custom classes as another dimension (without deleting __len__ or __getitem__)?

290

asked Aug 04 '16 18:08

MSeifert

1 Answers

This behavior has been discussed a number of times before (e.g. Override a dict with numpy support). np.array tries to make as high a dimensional array as it can. The model case is nested lists. If it can iterate and the sublists are equal in length it will 'drill' on down.

Here it went down 2 levels before encountering lists of different length:

In [250]: np.array([[[1,2],[3]],[1,2]],dtype=object)
Out[250]: 
array([[[1, 2], [3]],
       [1, 2]], dtype=object)
In [251]: _.shape
Out[251]: (2, 2)

Without a shape or ndmax parameter it has no way of knowing whether I want it to be (2,) or (2,2). Both of those would work with the dtype.

It's compiled code, so it isn't easy to see exactly what tests it uses. It tries to iterate on lists and tuples, but not on sets or dictionaries.

The surest way to make an object array with a given dimension is to start with an empty one, and fill it

In [266]: A=np.empty((2,3),object)
In [267]: A.fill([[1,'one']])
In [276]: A[:]={1,2}
In [277]: A[:]=[1,2]   # broadcast error

Another way is to start with at least one different element (e.g. a None), and then replace that.

There is a more primitive creator, ndarray that takes shape:

In [280]: np.ndarray((2,3),dtype=object)
Out[280]: 
array([[None, None, None],
       [None, None, None]], dtype=object)

But that's basically the same as np.empty (unless I give it a buffer).

These are fudges, but they aren't expensive (time wise).

================ (edit)

https://github.com/numpy/numpy/issues/5933, Enh: Object array creation function. is an enhancement request. Also https://github.com/numpy/numpy/issues/5303 the error message for accidentally irregular arrays is confusing.

The developer sentiment seems to favor a separate function to create dtype=object arrays, one with more control over the initial dimensions and depth of iteration. They might even strengthen the error checking to keep np.array from creating 'irregular' arrays.

Such a function could detect the shape of a regular nested iterable down to a specified depth, and build an object type array to be filled.

def objarray(alist, depth=1):
    shape=[]; l=alist
    for _ in range(depth):
        shape.append(len(l))
        l = l[0]
    arr = np.empty(shape, dtype=object)
    arr[:]=alist
    return arr

With various depths:

In [528]: alist=[[Test([1,2,3])], [Test([3,2,1])]]
In [529]: objarray(alist,1)
Out[529]: array([[Test([1, 2, 3])], [Test([3, 2, 1])]], dtype=object)
In [530]: objarray(alist,2)
Out[530]: 
array([[Test([1, 2, 3])],
       [Test([3, 2, 1])]], dtype=object)
In [531]: objarray(alist,3)  
Out[531]: 
array([[[1, 2, 3]],

       [[3, 2, 1]]], dtype=object)
In [532]: objarray(alist,4)
...
TypeError: object of type 'int' has no len()

178

answered Sep 17 '22 12:09

hpaulj

Related questions
                            
                                Drop rows with a 'question mark' value in any column in a pandas dataframe
                            
                                IncompatibleProtocolError while trying to connect to RabbitMQ
                            
                                How to manually install python-dev from source
                            
                                nltk regular expression tokenizer
                            
                                Sklearn PCA is pca.components_ the loadings?
                            
                                Remove Header and Footer from Pandas Dataframe print
                            
                                python regex add space whenever a number is adjacent to a non-number
                            
                                Split the result of 'counter'
                            
                                Difference between LinkExtractor and SgmlLinkExtractor
                            
                                Show hidden option using argparse
                            
                                Linear regression with pandas time series
                            
                                Is it possible to read field names from a compound Dataset in an HDF5 file in Python?
                            
                                Select array element from Spark Dataframes split method in same call?
                            
                                How to show/hide widgets in Tkinter?
                            
                                Set Union in pandas
                            
                                How to turn a system of sympy equations into matrix form
                            
                                How do I unit test a method that sets internal data, but doesn't return?
                            
                                Regular Expression to accept all Thai characters and English letters in python
                            
                                Python/Django database username and password?
                            
                                Reshape pandas dataframe from rows to columns

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Prevent numpy from creating a multidimensional array

Tags:

python

arrays

numpy

MSeifert

People also ask

1 Answers

hpaulj

Recent Activity

Donate For Us