Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy, named columns

Tags:

Simple question about numpy:

I load 100 values to a vector a. From this vector, I want to create an array A with 2 columns, where one column has name "C1" and second one "C2", one has type int32 and another int64. An example:

a = range(100) A = array(a).reshape( len(a)/2, 2) # A.dtype = ...? 

How to define the columns' types and names, when I create array from a?

like image 768
Jakub M. Avatar asked Aug 12 '11 09:08

Jakub M.


People also ask

What is f4 in NumPy?

f is a single-precision floating point number and in your case it uses 4 bytes (4 x 8 = 32 bits). dtype='<f4' Makes dtype a 32 bit single-precision floating point number using little endian order of bytes.

What is structured array in NumPy?

Numpy's Structured Array is similar to Struct in C. It is used for grouping data of different types and sizes. Structure array uses data containers called fields. Each data field can contain data of any type and size. Array elements can be accessed with the help of dot notation.

Is a NumPy array a tuple?

A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

What is Recarray?

recarray, which allows field access by attribute on the array object, and record arrays also use a special datatype, numpy. record, which allows field access by attribute on the individual elements of the array.


2 Answers

NumPy structured arrays have named columns:

import numpy as np      a = range(100) A = np.array(list(zip(*[iter(a)] * 2)), dtype=[('C1', 'int32'),('C2', 'int64')]) print(A.dtype) 
[('C1', '<i4'), ('C2', '<i8')] 

You can access the columns by name like this:

print(A['C1']) # [ 0  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 #  50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98] 

Note that using np.array with zip causes NumPy to build an array from a temporary list of tuples. Python lists of tuples use a lot more memory than equivalent NumPy arrays. So if your array is very large you may not want to use zip.

Instead, given a NumPy array A, you could use ravel() to make A a 1D array, and then use view to turn it into a structured array, and then use astype to convert the columns to the desired type:

a = range(100) A = np.array(a).reshape( len(a)//2, 2) A = A.ravel().view([('col1','i8'),('col2','i8'),]).astype([('col1','i4'),('col2','i8'),]) print(A[:5]) # array([(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)],  #       dtype=[('col1', '<i4'), ('col2', '<i8')])  print(A.dtype) # dtype([('col1', '<i4'), ('col2', '<i8')]) 
like image 91
unutbu Avatar answered Nov 09 '22 22:11

unutbu


I know this is an old question, but a more recently available option would be to try using pandas. The DataFrame type is designed for structured data like this, where columns are named and can be of different types.

like image 21
user2428107 Avatar answered Nov 09 '22 23:11

user2428107