I'm interested in using numpy arrays of somewhat inhomogenous data types. Since numpy specifies that the data must be homogenous, this would be accomplished by defining a super-dtype that acts as a union wrapper over all the sub-dtypes. Accessing the fields of the sub-dtypes then gives a different interpretation of the underlying data. There's already some facility for this, for example <pre class="prettyprint"><code>dtype(('|S2', [('x', '|i1'), ('y', '|i1')])) </code></pre> refers to an array of two-byte strings, but the first and second bytes can also be interpreted as integers through the 'x' and 'y' field names. I can't figure out how to assign a field label to the two-byte string, though. Can this be made more general, so that we can overlay any number of different field specifications on the data? My first try was to specify the field offsets in the dtype, but it failed with a complaint that the offsets must be ordered (i.e. non-overlapping data). <pre class="prettyprint"><code>dtype1 = np.dtype(dict( names=['a','b'], formats=['|a2','<i2'], offsets=[0,0])) </code></pre> Another technique works, but is cumbersome. In this technique I can define several variables as view onto the same underlying data, and change the dtype of the different variables to let me access the data in different formats, i.e. <pre class="prettyprint"><code>a=np.zeros(3, dtype='<a2') b=a[:] b.dtype='<i2' </code></pre> This lets me access the data either as strings or integers depending on whether I'm looking at a or b. But it is a cumbersome way of manipulating the data. Ideally, I'd like to be able to specify a variety of different fields with arbitrary offsets. Is there any way to do this?

Union dtypes have been allowed since June 2011: https://github.com/numpy/numpy/pull/94 You'll need to upgrade to NumPy 1.7.x to use this. However, in previous versions you can use the overlay dtype constructor: <pre class="prettyprint"><code>>>> a = np.zeros(3, dtype=np.dtype(('<i2', [('a', '|a2')]))) >>> a[0] = 0x3456 >>> a['a'][0] 'V4' </code></pre> This is documented at http://docs.scipy.org/doc/numpy-dev/reference/arrays.dtypes.html#specifying-and-constructing-data-types (search for <code>(base_dtype, new_dtype)</code>).

c-style union with numpy dtypes?

Q: What is Dtype U11 Numpy?

# dtype('<U11') In the first case, each element of the list we pass to the array constructor is an integer. Therefore, NumPy decides that the dtype should be integer (32 bit integer to be precise). In the second case, one of the elements (3.0) is a floating-point number.

Q: What is Dtype NP uint8?

dtype dtype('uint8') dtype objects also contain information about the type, such as its bit-width and its byte-order. The data type can also be used indirectly to query properties of the type, such as whether it is an integer: >>> d = np. dtype(int) >>> d dtype('int32') >>> np.

Tags:

union

numpy

I'm interested in using numpy arrays of somewhat inhomogenous data types. Since numpy specifies that the data must be homogenous, this would be accomplished by defining a super-dtype that acts as a union wrapper over all the sub-dtypes. Accessing the fields of the sub-dtypes then gives a different interpretation of the underlying data.

There's already some facility for this, for example

dtype(('|S2', [('x', '|i1'), ('y', '|i1')]))

refers to an array of two-byte strings, but the first and second bytes can also be interpreted as integers through the 'x' and 'y' field names. I can't figure out how to assign a field label to the two-byte string, though.

Can this be made more general, so that we can overlay any number of different field specifications on the data?

My first try was to specify the field offsets in the dtype, but it failed with a complaint that the offsets must be ordered (i.e. non-overlapping data).

dtype1 = np.dtype(dict(
   names=['a','b'], 
   formats=['|a2','<i2'], 
   offsets=[0,0]))

Another technique works, but is cumbersome. In this technique I can define several variables as view onto the same underlying data, and change the dtype of the different variables to let me access the data in different formats, i.e.

a=np.zeros(3, dtype='<a2')
b=a[:]
b.dtype='<i2'

This lets me access the data either as strings or integers depending on whether I'm looking at a or b. But it is a cumbersome way of manipulating the data. Ideally, I'd like to be able to specify a variety of different fields with arbitrary offsets. Is there any way to do this?

315

asked Jan 14 '13 09:01

russt

1 Answers

Union dtypes have been allowed since June 2011: https://github.com/numpy/numpy/pull/94

You'll need to upgrade to NumPy 1.7.x to use this.

However, in previous versions you can use the overlay dtype constructor:

>>> a = np.zeros(3, dtype=np.dtype(('<i2', [('a', '|a2')])))
>>> a[0] = 0x3456
>>> a['a'][0]
'V4'

This is documented at http://docs.scipy.org/doc/numpy-dev/reference/arrays.dtypes.html#specifying-and-constructing-data-types (search for (base_dtype, new_dtype)).

110

answered Oct 01 '22 07:10

ecatmur

Related questions
                            
                                Python: faster alternative to numpy's random.choice()?
                            
                                Does importing a Python file also import the imported files into shell?
                            
                                Map index of numpy matrix
                            
                                How to efficiently partial argsort Pandas dataframe across columns
                            
                                Python NumPy - angled slice of 3D array
                            
                                ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array
                            
                                How do I find the index at which a given value will be reached/cross by another series?
                            
                                Getting a numpy array view with integer or boolean indexing
                            
                                Animation of tangent line of a 3D curve
                            
                                Reasons for differences in memory consumption and performances of np.zeros and np.full
                            
                                Numpy finding interval which has a least k points
                            
                                Efficient way of filtering by datetime in groupby
                            
                                numpy: boolean indexing and memory usage
                            
                                creating elevation/height field gdal numpy python
                            
                                numpy Loadtxt function seems to be consuming too much memory
                            
                                How to manage a 2D Fourier Transform(FFT) on a 3D numpy masked array?
                            
                                How to make a numpy recarray with datatypes (datetime,float)?
                            
                                Why does comparison of a numpy array with a list consume so much memory?
                            
                                Vectorize this convolution type loop more efficiently in numpy
                            
                                Drawing lines between pairs in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With