What's the best way to convert numpy's recarray
to a normal array?
i could do a .tolist()
first and then do an array()
again, but that seems somewhat inefficient..
Example:
import numpy as np
a = np.recarray((2,), dtype=[('x', int), ('y', float), ('z', int)])
>>> a
rec.array([(30408891, 9.2944097561804909e-296, 30261980),
(44512448, 4.5273310988985789e-300, 29979040)],
dtype=[('x', '<i4'), ('y', '<f8'), ('z', '<i4')])
>>> np.array(a.tolist())
array([[ 3.04088910e+007, 9.29440976e-296, 3.02619800e+007],
[ 4.45124480e+007, 4.52733110e-300, 2.99790400e+007]])
Record arrays are structured arrays wrapped using a subclass of ndarray, numpy. recarray, which allows field access by attribute on the array object, and record arrays also use a special datatype, numpy. record, which allows field access by attribute on the individual elements of the array.
Numpy arrays are more compact than python lists, which uses less memory. Numpy is also not just more efficient but convienient. There are a lot of vector and matrix operations in Numpy. There are also things built into Numpy such as FFT's, convolutions, statistics, histograms, etc.
We can use NumPy np. array tolist() function to convert an array to a list. If the array is multi-dimensional, a nested list is returned. For a one-dimensional array, a list with the array elements is returned.
By "normal array" I take it you mean a NumPy array of homogeneous dtype. Given a recarray, such as:
>>> a = np.array([(0, 1, 2),
(3, 4, 5)],[('x', int), ('y', float), ('z', int)]).view(np.recarray)
rec.array([(0, 1.0, 2), (3, 4.0, 5)],
dtype=[('x', '<i4'), ('y', '<f8'), ('z', '<i4')])
we must first make each column have the same dtype. We can then convert it to a "normal array" by viewing the data by the same dtype:
>>> a.astype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')]).view('<f8')
array([ 0., 1., 2., 3., 4., 5.])
astype returns a new numpy array. So the above requires additional memory in an amount proportional to the size of a
. Each row of a
requires 4+8+4=16 bytes, while a.astype(...)
requires 8*3=24 bytes. Calling view requires no new memory, since view
just changes how the underlying data is interpreted.
a.tolist()
returns a new Python list. Each Python number is an object which requires more bytes than its equivalent representation in a numpy array. So a.tolist()
requires more memory than a.astype(...)
.
Calling a.astype(...).view(...)
is also faster than np.array(a.tolist())
:
In [8]: a = np.array(zip(*[iter(xrange(300))]*3),[('x', int), ('y', float), ('z', int)]).view(np.recarray)
In [9]: %timeit a.astype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')]).view('<f8')
10000 loops, best of 3: 165 us per loop
In [10]: %timeit np.array(a.tolist())
1000 loops, best of 3: 683 us per loop
Here is a relatively clean solution using pandas
:
>>> import numpy as np
>>> import pandas as pd
>>> a = np.recarray((2,), dtype=[('x', int), ('y', float), ('z', int)])
>>> arr = pd.DataFrame(a).to_numpy()
>>> arr
array([[9.38925058e+013, 0.00000000e+000, 1.40380704e+014],
[1.40380704e+014, 6.93572751e-310, 1.40380484e+014]])
>>> arr.shape
(2, 3)
>>> arr.dtype
dtype('float64')
First the data from the recarray
are loaded into a pd.DataFrame
, then the data are exported using the DataFrame.to_numpy
method. As we can see, this method call has automatically converted all of the data to type float64
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With