Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add column to numpy array

Tags:

python

numpy

I am trying to add one column to the array created from recfromcsv. In this case it's an array: [210,8] (rows, cols).

I want to add a ninth column. Empty or with zeroes doesn't matter.

from numpy import genfromtxt from numpy import recfromcsv import numpy as np import time  if __name__ == '__main__':  print("testing")  my_data = recfromcsv('LIAB.ST.csv', delimiter='\t')  array_size = my_data.size  #my_data = np.append(my_data[:array_size],my_data[9:],0)   new_col = np.sum(x,1).reshape((x.shape[0],1))  np.append(x,new_col,1) 
like image 590
user2130951 Avatar asked Apr 04 '13 15:04

user2130951


People also ask

Can you add to a NumPy array?

You can add a NumPy array element by using the append() method of the NumPy module. The values will be appended at the end of the array and a new ndarray will be returned with new and old values as shown above. The axis is an optional integer along which define how the array is going to be displayed.

How do I append a row to a NumPy array?

Use the numpy. append() Function to Add a Row to a Matrix in NumPy. The append() function from the numpy module can add elements to the end of the array. By specifying the axis as 0, we can use this function to add rows to a matrix.


1 Answers

I think that your problem is that you are expecting np.append to add the column in-place, but what it does, because of how numpy data is stored, is create a copy of the joined arrays

Returns ------- append : ndarray     A copy of `arr` with `values` appended to `axis`.  Note that `append`     does not occur in-place: a new array is allocated and filled.  If     `axis` is None, `out` is a flattened array. 

so you need to save the output all_data = np.append(...):

my_data = np.random.random((210,8)) #recfromcsv('LIAB.ST.csv', delimiter='\t') new_col = my_data.sum(1)[...,None] # None keeps (n, 1) shape new_col.shape #(210,1) all_data = np.append(my_data, new_col, 1) all_data.shape #(210,9) 

Alternative ways:

all_data = np.hstack((my_data, new_col)) #or all_data = np.concatenate((my_data, new_col), 1) 

I believe that the only difference between these three functions (as well as np.vstack) are their default behaviors for when axis is unspecified:

  • concatenate assumes axis = 0
  • hstack assumes axis = 1 unless inputs are 1d, then axis = 0
  • vstack assumes axis = 0 after adding an axis if inputs are 1d
  • append flattens array

Based on your comment, and looking more closely at your example code, I now believe that what you are probably looking to do is add a field to a record array. You imported both genfromtxt which returns a structured array and recfromcsv which returns the subtly different record array (recarray). You used the recfromcsv so right now my_data is actually a recarray, which means that most likely my_data.shape = (210,) since recarrays are 1d arrays of records, where each record is a tuple with the given dtype.

So you could try this:

import numpy as np from numpy.lib.recfunctions import append_fields x = np.random.random(10) y = np.random.random(10) z = np.random.random(10) data = np.array( list(zip(x,y,z)), dtype=[('x',float),('y',float),('z',float)]) data = np.recarray(data.shape, data.dtype, buf=data) data.shape #(10,) tot = data['x'] + data['y'] + data['z'] # sum(axis=1) won't work on recarray tot.shape #(10,) all_data = append_fields(data, 'total', tot, usemask=False) all_data #array([(0.4374783740738456 , 0.04307289878861764, 0.021176067323686598, 0.5017273401861498), #       (0.07622262416466963, 0.3962146058689695 , 0.27912715826653534 , 0.7515643883001745), #       (0.30878532523061153, 0.8553768789387086 , 0.9577415585116588  , 2.121903762680979 ), #       (0.5288343561208022 , 0.17048864443625933, 0.07915689716226904 , 0.7784798977193306), #       (0.8804269791375121 , 0.45517504750917714, 0.1601389248542675  , 1.4957409515009568), #       (0.9556552723429782 , 0.8884504475901043 , 0.6412854758843308  , 2.4853911958174133), #       (0.0227638618687922 , 0.9295332854783015 , 0.3234597575660103  , 1.275756904913104 ), #       (0.684075052174589  , 0.6654774682866273 , 0.5246593820025259  , 1.8742119024637423), #       (0.9841793718333871 , 0.5813955915551511 , 0.39577520705133684 , 1.961350170439875 ), #       (0.9889343795296571 , 0.22830104497714432, 0.20011292764078448 , 1.4173483521475858)],  #      dtype=[('x', '<f8'), ('y', '<f8'), ('z', '<f8'), ('total', '<f8')]) all_data.shape #(10,) all_data.dtype.names #('x', 'y', 'z', 'total') 
like image 68
askewchan Avatar answered Sep 29 '22 01:09

askewchan