Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python/Numpy - Save Array with Column AND Row Titles

I want to save a 2D array to a CSV file with row and column "header" information (like a table). I know that I could use the header argument to numpy.savetxt to save the column names, but is there any easy way to also include some other array (or list) as the first column of data (like row titles)?

Below is an example of how I currently do it. Is there a better way to include those row titles, perhaps some trick with savetxt I'm unaware of?

import csv
import numpy as np

data = np.arange(12).reshape(3,4)
# Add a '' for the first column because the row titles go there...
cols = ['', 'col1', 'col2', 'col3', 'col4']
rows = ['row1', 'row2', 'row3']

with open('test.csv', 'wb') as f:
   writer = csv.writer(f)
   writer.writerow(cols)
   for row_title, data_row in zip(rows, data):
      writer.writerow([row_title] + data_row.tolist())
like image 713
Scott B Avatar asked Mar 28 '12 17:03

Scott B


People also ask

How do I save a NumPy array to text?

Let us see how to save a numpy array to a text file. Creating a text file using the in-built open() function and then converting the array into string and writing it into the text file using the write() function. Finally closing the file using close() function.

How do I save an array in NumPy?

You can save your NumPy arrays to CSV files using the savetxt() function. This function takes a filename and array as arguments and saves the array into CSV format. You must also specify the delimiter; this is the character used to separate each variable in the file, most commonly a comma.

How do I select rows and columns in NumPy?

We can use [][] operator to select an element from Numpy Array i.e. Example 1: Select the element at row index 1 and column index 2. Or we can pass the comma separated list of indices representing row index & column index too i.e.


1 Answers

Maybe you'd prefer to do something like this:

# Column of row titles
rows = np.array(['row1', 'row2', 'row3'], dtype='|S20')[:, np.newaxis]
with open('test.csv', 'w') as f:
    np.savetxt(f, np.hstack((rows, data)), delimiter=', ', fmt='%s')

This is implicitly converting data to an array of strings, and takes about 200 ms for every million items in my computer.

The dtype '|S20' means strings of twenty characters. If it's too low, your numbers will get chopped:

>>> np.asarray([123], dtype='|S2')
array(['12'], 
  dtype='|S2')

Another option, that from my limited testing is slower, but gives you a lot more control and doesn't have the chopping issue would be using np.char.mod, like

# Column of row titles
rows = np.array(['row1', 'row2', 'row3'])[:, np.newaxis]
str_data = np.char.mod("%10.6f", data)
with open('test.csv', 'w') as f:
    np.savetxt(f, np.hstack((rows, str_data)), delimiter=', ', fmt='%s')
like image 172
jorgeca Avatar answered Oct 14 '22 07:10

jorgeca