Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert numpy.recarray to numpy.array?

Tags:

What's the best way to convert numpy's recarray to a normal array?

i could do a .tolist() first and then do an array() again, but that seems somewhat inefficient..

Example:

import numpy as np
a = np.recarray((2,), dtype=[('x', int), ('y', float), ('z', int)])

>>> a
  rec.array([(30408891, 9.2944097561804909e-296, 30261980),
   (44512448, 4.5273310988985789e-300, 29979040)], 
  dtype=[('x', '<i4'), ('y', '<f8'), ('z', '<i4')])

>>> np.array(a.tolist())
   array([[  3.04088910e+007,   9.29440976e-296,   3.02619800e+007],
   [  4.45124480e+007,   4.52733110e-300,   2.99790400e+007]])
like image 360
Muppet Avatar asked Oct 20 '11 20:10

Muppet


People also ask

What is a NumPy Recarray?

Record arrays are structured arrays wrapped using a subclass of ndarray, numpy. recarray, which allows field access by attribute on the array object, and record arrays also use a special datatype, numpy. record, which allows field access by attribute on the individual elements of the array.

Why do we convert data into NumPy array?

Numpy arrays are more compact than python lists, which uses less memory. Numpy is also not just more efficient but convienient. There are a lot of vector and matrix operations in Numpy. There are also things built into Numpy such as FFT's, convolutions, statistics, histograms, etc.

Can you convert a NumPy array to a list?

We can use NumPy np. array tolist() function to convert an array to a list. If the array is multi-dimensional, a nested list is returned. For a one-dimensional array, a list with the array elements is returned.


2 Answers

By "normal array" I take it you mean a NumPy array of homogeneous dtype. Given a recarray, such as:

>>> a = np.array([(0, 1, 2),
              (3, 4, 5)],[('x', int), ('y', float), ('z', int)]).view(np.recarray)
rec.array([(0, 1.0, 2), (3, 4.0, 5)], 
      dtype=[('x', '<i4'), ('y', '<f8'), ('z', '<i4')])

we must first make each column have the same dtype. We can then convert it to a "normal array" by viewing the data by the same dtype:

>>> a.astype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')]).view('<f8')
array([ 0.,  1.,  2.,  3.,  4.,  5.])

astype returns a new numpy array. So the above requires additional memory in an amount proportional to the size of a. Each row of a requires 4+8+4=16 bytes, while a.astype(...) requires 8*3=24 bytes. Calling view requires no new memory, since view just changes how the underlying data is interpreted.

a.tolist() returns a new Python list. Each Python number is an object which requires more bytes than its equivalent representation in a numpy array. So a.tolist() requires more memory than a.astype(...).

Calling a.astype(...).view(...) is also faster than np.array(a.tolist()):

In [8]: a = np.array(zip(*[iter(xrange(300))]*3),[('x', int), ('y', float), ('z', int)]).view(np.recarray)

In [9]: %timeit a.astype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')]).view('<f8')
10000 loops, best of 3: 165 us per loop

In [10]: %timeit np.array(a.tolist())
1000 loops, best of 3: 683 us per loop
like image 158
unutbu Avatar answered Oct 24 '22 14:10

unutbu


Here is a relatively clean solution using pandas:

>>> import numpy as np
>>> import pandas as pd
>>> a = np.recarray((2,), dtype=[('x', int), ('y', float), ('z', int)])
>>> arr = pd.DataFrame(a).to_numpy()
>>> arr
array([[9.38925058e+013, 0.00000000e+000, 1.40380704e+014],
       [1.40380704e+014, 6.93572751e-310, 1.40380484e+014]])
>>> arr.shape
(2, 3)
>>> arr.dtype
dtype('float64')

First the data from the recarray are loaded into a pd.DataFrame, then the data are exported using the DataFrame.to_numpy method. As we can see, this method call has automatically converted all of the data to type float64.

like image 43
Jasha Avatar answered Oct 24 '22 12:10

Jasha