Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you remove a column from a structured numpy array?

Tags:

python

numpy

Imagine you have a structured numpy array, generated from a csv with the first row as field names. The array has the form:

dtype([('A', '<f8'), ('B', '<f8'), ('C', '<f8'), ..., ('n','<f8'])

Now, lets say you want to remove from this array the 'ith' column. Is there a convenient way to do that?

I'd like a it to work like delete:

new_array = np.delete(old_array, 'i')

Any ideas?

like image 228
Dobbs_Head Avatar asked Mar 22 '13 16:03

Dobbs_Head


People also ask

How do I remove an element from a NumPy array in Python?

Deleting element from NumPy array using np. The delete(array_name ) method will be used to do the same. Where array_name is the name of the array to be deleted and index-value is the index of the element to be deleted.

How can I remove columns in NumPy array that contains non numeric values?

Many times we have non-numeric values in NumPy array. These values need to be removed, so that array will be free from all these unnecessary values and look more decent. It is possible to remove all columns containing Nan values using the Bitwise NOT operator and np. isnan() function.


2 Answers

It's not quite a single function call, but the following shows one way to drop the i-th field:

In [67]: a
Out[67]: 
array([(1.0, 2.0, 3.0), (4.0, 5.0, 6.0)], 
      dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

In [68]: i = 1   # Drop the 'B' field

In [69]: names = list(a.dtype.names)

In [70]: names
Out[70]: ['A', 'B', 'C']

In [71]: new_names = names[:i] + names[i+1:]

In [72]: new_names
Out[72]: ['A', 'C']

In [73]: b = a[new_names]

In [74]: b
Out[74]: 
array([(1.0, 3.0), (4.0, 6.0)], 
      dtype=[('A', '<f8'), ('C', '<f8')])

Wrapped up as a function:

def remove_field_num(a, i):
    names = list(a.dtype.names)
    new_names = names[:i] + names[i+1:]
    b = a[new_names]
    return b

It might be more natural to remove a given field name:

def remove_field_name(a, name):
    names = list(a.dtype.names)
    if name in names:
        names.remove(name)
    b = a[names]
    return b

Also, check out the drop_rec_fields function that is part of the mlab module of matplotlib.


Update: See my answer at How to remove a column from a structured numpy array *without copying it*? for a method to create a view of subsets of the fields of a structured array without making a copy of the array.

like image 181
Warren Weckesser Avatar answered Oct 12 '22 23:10

Warren Weckesser


Having googled my way here and learned what I needed to know from Warren's answer, I couldn't resist posting a more succinct version, with the added option to remove multiple fields efficiently in one go:

def rmfield( a, *fieldnames_to_remove ):
    return a[ [ name for name in a.dtype.names if name not in fieldnames_to_remove ] ]

Examples:

a = rmfield(a, 'foo')
a = rmfield(a, 'foo', 'bar')  # remove multiple fields at once

Or if we're really going to golf it, the following is equivalent:

rmfield=lambda a,*f:a[[n for n in a.dtype.names if n not in f]]
like image 38
jez Avatar answered Oct 13 '22 01:10

jez