Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy - how to add a value to every element in the first column of an array?

Tags:

python

numpy

I have an array like this:

array([('6506', 4.6725971801473496e-25, 0.99999999995088695),
       ('6601', 2.2452745388799898e-27, 0.99999999995270605),
       ('21801', 1.9849650921836601e-31, 0.99999999997999001), ...,
       ('45164194', 1.0413482803123399e-24, 0.99999999997453404),
       ('45164198', 1.09470356446595e-24, 0.99999999997635303),
       ('45164519', 3.7521365799080699e-24, 0.99999999997453404)], 
      dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])

And I want to turn it into this: (adding a prefix '2R' onto each value in the first column)

array([('2R:6506', 4.6725971801473496e-25, 0.99999999995088695),
       ('2R:6601', 2.2452745388799898e-27, 0.99999999995270605),
       ('2R:21801', 1.9849650921836601e-31, 0.99999999997999001), ...,
       ('2R:45164194', 1.0413482803123399e-24, 0.99999999997453404),
       ('2R:45164198', 1.09470356446595e-24, 0.99999999997635303),
       ('2R:45164519', 3.7521365799080699e-24, 0.99999999997453404)], 
      dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])

I looked up some stuff about nditer (but I want to support earlier versions of numpy.) Also I'm reading one should avoid iteration.

like image 814
Greg Avatar asked May 06 '14 14:05

Greg


People also ask

How do you add a number to every value in an array?

We can add any integer to each element in an array by using “+” operator.

Can you append values to NumPy array?

Adding values at the end of the array is a necessary task especially when the data is not fixed and is prone to change. For this task we can use numpy. append(). This function can help us to append a single value as well as multiple values at the end of the array.


2 Answers

Using numpy.core.defchararray.add:

>>> from numpy import array
>>> from numpy.core.defchararray import add
>>>
>>> xs = array([('6506', 4.6725971801473496e-25, 0.99999999995088695),
...             ('6601', 2.2452745388799898e-27, 0.99999999995270605),
...             ('21801', 1.9849650921836601e-31, 0.99999999997999001),
...             ('45164194', 1.0413482803123399e-24, 0.99999999997453404),
...             ('45164198', 1.09470356446595e-24, 0.99999999997635303),
...             ('45164519', 3.7521365799080699e-24, 0.99999999997453404)],
...            dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])
>>> xs['pos'] = add('2R:', xs['pos'])
>>> xs
array([('2R:6506', 4.67259718014735e-25, 0.999999999950887),
       ('2R:6601', 2.24527453887999e-27, 0.999999999952706),
       ('2R:21801', 1.98496509218366e-31, 0.99999999997999),
       ('2R:45164194', 1.04134828031234e-24, 0.999999999974534),
       ('2R:45164198', 1.09470356446595e-24, 0.999999999976353),
       ('2R:45164519', 3.75213657990807e-24, 0.999999999974534)],
      dtype=[('pos', 'S100'), ('par1', '<f8'), ('par2', '<f8')])
like image 170
falsetru Avatar answered Nov 03 '22 00:11

falsetru


A simple (albeit perhaps not optimal) solution is just:

a = np.array([('6506', 4.6725971801473496e-25, 0.99999999995088695),
       ('6601', 2.2452745388799898e-27, 0.99999999995270605),
       ('21801', 1.9849650921836601e-31, 0.99999999997999001),
       ('45164194', 1.0413482803123399e-24, 0.99999999997453404),
       ('45164198', 1.09470356446595e-24, 0.99999999997635303),
       ('45164519', 3.7521365799080699e-24, 0.99999999997453404)],
      dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])


a['pos'] = [''.join(('2R:',x)) for x in a['pos']]

In [11]: a
Out[11]:
array([('2R:6506', 4.67259718014735e-25, 0.999999999950887),
       ('2R:6601', 2.24527453887999e-27, 0.999999999952706),
       ('2R:21801', 1.98496509218366e-31, 0.99999999997999),
       ('2R:45164194', 1.04134828031234e-24, 0.999999999974534),
       ('2R:45164198', 1.09470356446595e-24, 0.999999999976353),
       ('2R:45164519', 3.75213657990807e-24, 0.999999999974534)],
      dtype=[('pos', 'S100'), ('par1', '<f8'), ('par2', '<f8')])

While I like @falsetru's answer for using core numpy routines, surprisingly, list comprehension seems a bit faster:

In [19]: a = np.empty(20000, dtype=[('pos', 'S100'), ('par1', '<f8'), ('par2', '<f8')])

In [20]: %timeit a['pos'] = [''.join(('2R:',x)) for x in a['pos']]
100 loops, best of 3: 11.1 ms per loop

In [21]: %timeit a['pos'] = add('2R:', a['pos'])
100 loops, best of 3: 15.7 ms per loop

Definitely benchmark your own use case and hardware to see which makes more sense for your actual application though. One of the things I've learned is that in certain situations, basic python constructs can outperform numpy built-ins, depending on the task at hand.

like image 42
JoshAdel Avatar answered Nov 03 '22 00:11

JoshAdel