Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the identity of "ndim, shape, size, ..etc" of ndarray in numpy

I'm quite new at Python.

After using Matlab for many many years, recently, I started studying numpy/scipy

It seems like the most basic element of numpy seems to be ndarray. In ndarray, there are following attributes:

  • ndarray.ndim
  • ndarray.shape
  • ndarray.size
  • ...etc

I'm quite familiar with C++/JAVA classes, but I'm a novice at Python OOP.


Q1: My first question is what is the identity of the above attributes?

At first, I assumed that the above attribute might be public member variables. But soon, I found that a.ndim = 10 doesn't work (assuming a is an object of ndarray) So, it seems it is not a public member variable.

Next, I guessed that they might be public methods similar to getter methods in C++. However, when I tried a.nidm() with a parenthesis, it doesn't work. So, it seems that it is not a public method.

The other possibility might be that they are private member variables, but but print a.ndim works, so they cannot be private data members.

So, I cannot figure out what is the true identity of the above attributes.


Q2. Where I can find the Python code implementation of ndarray? Since I installed numpy/scipy on my local PC, I guess there might be some ways to look at the source code, then I think everything might be clear.

Could you give some advice on this?

like image 618
chanwcom Avatar asked Jul 11 '15 22:07

chanwcom


1 Answers

numpy is implemented as a mix of C code and Python code. The source is available for browsing on github, and can be downloaded as a git repository. But digging your way into the C source takes some work. A lot of the files are marked as .c.src, which means they pass through one or more layers of perprocessing before compiling.

And Python is written in a mix of C and Python as well. So don't try to force things into C++ terms.

It's probably better to draw on your MATLAB experience, with adjustments to allow for Python. And numpy has a number of quirks that go beyond Python. It is using Python syntax, but because it has its own C code, it isn't simply a Python class.

I use Ipython as my usual working environment. With that I can use foo? to see the documentation for foo (same as the Python help(foo), and foo?? to see the code - if it is writen in Python (like the MATLAB/Octave type(foo))

Python objects have attributes, and methods. Also properties which look like attributes, but actually use methods to get/set. Usually you don't need to be aware of the difference between attributes and properties.

 x.ndim   # as noted, has a get, but no set; see also np.ndim(x)
 x.shape   # has a get, but can also be set; see also np.shape(x)

x.<tab> in Ipython shows me all the completions for a ndarray. There are 4*18. Some are methods, some attributes. x._<tab> shows a bunch more that start with __. These are 'private' - not meant for public consumption, but that's just semantics. You can look at them and use them if needed.

Off hand x.shape is the only ndarray property that I set, and even with that I usually use reshape(...) instead. Read their docs to see the difference. ndim is the number of dimensions, and it doesn't make sense to change that directly. It is len(x.shape); change the shape to change ndim. Likewise x.size shouldn't be something you change directly.

Some of these properties are accessible via functions. np.shape(x) == x.shape, similar to MATLAB size(x). (MATLAB doesn't have . attribute syntax).

x.__array_interface__ is a handy property, that gives a dictionary with a number of the attributes

In [391]: x.__array_interface__
Out[391]: 
{'descr': [('', '<f8')],
 'version': 3,
 'shape': (50,),
 'typestr': '<f8',
 'strides': None,
 'data': (165646680, False)}

The docs for ndarray(shape, dtype=float, buffer=None, offset=0, strides=None, order=None), the __new__ method lists these attributes:

`Attributes
----------
T : ndarray
    Transpose of the array.
data : buffer
    The array's elements, in memory.
dtype : dtype object
    Describes the format of the elements in the array.
flags : dict
    Dictionary containing information related to memory use, e.g.,
    'C_CONTIGUOUS', 'OWNDATA', 'WRITEABLE', etc.
flat : numpy.flatiter object
    Flattened version of the array as an iterator.  The iterator
    allows assignments, e.g., ``x.flat = 3`` (See `ndarray.flat` for
    assignment examples; TODO).
imag : ndarray
    Imaginary part of the array.
real : ndarray
    Real part of the array.
size : int
    Number of elements in the array.
itemsize : int
    The memory use of each array element in bytes.
nbytes : int
    The total number of bytes required to store the array data,
    i.e., ``itemsize * size``.
ndim : int
    The array's number of dimensions.
shape : tuple of ints
    Shape of the array.
strides : tuple of ints
    The step-size required to move from one element to the next in
    memory. For example, a contiguous ``(3, 4)`` array of type
    ``int16`` in C-order has strides ``(8, 2)``.  This implies that
    to move from element to element in memory requires jumps of 2 bytes.
    To move from row-to-row, one needs to jump 8 bytes at a time
    (``2 * 4``).
ctypes : ctypes object
    Class containing properties of the array needed for interaction
    with ctypes.
base : ndarray
    If the array is a view into another array, that array is its `base`
    (unless that array is also a view).  The `base` array is where the
    array data is actually stored.

All of these should be treated as properties, though I don't think numpy actually uses the property mechanism. In general they should be considered to be 'read-only'. Besides shape, I only recall changing data (pointer to a data buffer), and strides.

like image 101
hpaulj Avatar answered Nov 15 '22 08:11

hpaulj