Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

save numpy array in append mode

Tags:

python

save

numpy

Is it possible to save a numpy array appending it to an already existing npy-file --- something like np.save(filename,arr,mode='a')?

I have several functions that have to iterate over the rows of a large array. I cannot create the array at once because of memory constrains. To avoid to create the rows over and over again, I wanted to create each row once and save it to file appending it to the previous row in the file. Later I could load the npy-file in mmap_mode, accessing the slices when needed.

like image 349
user3820991 Avatar asked May 21 '15 14:05

user3820991


People also ask

How do I append a NumPy array to a file?

The NumPy module savetxt() method is used to append the Numpy array at the end of the existing csv file with a single format pattern fmt = %s. We will open CSV file in append mode and call the savetxt() method to writes rows with header. Second call the savetxt() method to append the rows.

How do I save an array in NumPy?

You can save your NumPy arrays to CSV files using the savetxt() function. This function takes a filename and array as arguments and saves the array into CSV format. You must also specify the delimiter; this is the character used to separate each variable in the file, most commonly a comma.

How do I save a NumPy array as a text file?

Let us see how to save a numpy array to a text file. Creating a text file using the in-built open() function and then converting the array into string and writing it into the text file using the write() function. Finally closing the file using close() function.


2 Answers

The build-in .npy file format is perfectly fine for working with small datasets, without relying on external modules other then numpy.

However, when you start having large amounts of data, the use of a file format, such as HDF5, designed to handle such datasets, is to be preferred [1].

For instance, below is a solution to save numpy arrays in HDF5 with PyTables,

Step 1: Create an extendable EArray storage

import tables import numpy as np  filename = 'outarray.h5' ROW_SIZE = 100 NUM_COLUMNS = 200  f = tables.open_file(filename, mode='w') atom = tables.Float64Atom()  array_c = f.create_earray(f.root, 'data', atom, (0, ROW_SIZE))  for idx in range(NUM_COLUMNS):     x = np.random.rand(1, ROW_SIZE)     array_c.append(x) f.close() 

Step 2: Append rows to an existing dataset (if needed)

f = tables.open_file(filename, mode='a') f.root.data.append(x) 

Step 3: Read back a subset of the data

f = tables.open_file(filename, mode='r') print(f.root.data[1:10,2:20]) # e.g. read from disk only this part of the dataset 
like image 100
rth Avatar answered Oct 10 '22 02:10

rth


This is an expansion on Mohit Pandey's answer showing a full save / load example. It was tested using Python 3.6 and Numpy 1.11.3.

from pathlib import Path import numpy as np import os  p = Path('temp.npy') with p.open('ab') as f:     np.save(f, np.zeros(2))     np.save(f, np.ones(2))  with p.open('rb') as f:     fsz = os.fstat(f.fileno()).st_size     out = np.load(f)     while f.tell() < fsz:         out = np.vstack((out, np.load(f))) 

out = array([[ 0., 0.], [ 1., 1.]])

like image 41
PaxRomana99 Avatar answered Oct 10 '22 02:10

PaxRomana99