Simple question about numpy:
I load 100 values to a vector a
. From this vector, I want to create an array A
with 2 columns, where one column has name "C1" and second one "C2", one has type int32
and another int64
. An example:
a = range(100) A = array(a).reshape( len(a)/2, 2) # A.dtype = ...?
How to define the columns' types and names, when I create array from a
?
f is a single-precision floating point number and in your case it uses 4 bytes (4 x 8 = 32 bits). dtype='<f4' Makes dtype a 32 bit single-precision floating point number using little endian order of bytes.
Numpy's Structured Array is similar to Struct in C. It is used for grouping data of different types and sizes. Structure array uses data containers called fields. Each data field can contain data of any type and size. Array elements can be accessed with the help of dot notation.
A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.
recarray, which allows field access by attribute on the array object, and record arrays also use a special datatype, numpy. record, which allows field access by attribute on the individual elements of the array.
NumPy structured arrays have named columns:
import numpy as np a = range(100) A = np.array(list(zip(*[iter(a)] * 2)), dtype=[('C1', 'int32'),('C2', 'int64')]) print(A.dtype)
[('C1', '<i4'), ('C2', '<i8')]
You can access the columns by name like this:
print(A['C1']) # [ 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 # 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98]
Note that using np.array
with zip
causes NumPy to build an array from a temporary list of tuples. Python lists of tuples use a lot more memory than equivalent NumPy arrays. So if your array is very large you may not want to use zip
.
Instead, given a NumPy array A
, you could use ravel()
to make A
a 1D array, and then use view
to turn it into a structured array, and then use astype
to convert the columns to the desired type:
a = range(100) A = np.array(a).reshape( len(a)//2, 2) A = A.ravel().view([('col1','i8'),('col2','i8'),]).astype([('col1','i4'),('col2','i8'),]) print(A[:5]) # array([(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)], # dtype=[('col1', '<i4'), ('col2', '<i8')]) print(A.dtype) # dtype([('col1', '<i4'), ('col2', '<i8')])
I know this is an old question, but a more recently available option would be to try using pandas. The DataFrame type is designed for structured data like this, where columns are named and can be of different types.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With