Is it possible to save a numpy array appending it to an already existing npy-file --- something like np.save(filename,arr,mode='a')
?
I have several functions that have to iterate over the rows of a large array. I cannot create the array at once because of memory constrains. To avoid to create the rows over and over again, I wanted to create each row once and save it to file appending it to the previous row in the file. Later I could load the npy-file in mmap_mode, accessing the slices when needed.
The NumPy module savetxt() method is used to append the Numpy array at the end of the existing csv file with a single format pattern fmt = %s. We will open CSV file in append mode and call the savetxt() method to writes rows with header. Second call the savetxt() method to append the rows.
You can save your NumPy arrays to CSV files using the savetxt() function. This function takes a filename and array as arguments and saves the array into CSV format. You must also specify the delimiter; this is the character used to separate each variable in the file, most commonly a comma.
Let us see how to save a numpy array to a text file. Creating a text file using the in-built open() function and then converting the array into string and writing it into the text file using the write() function. Finally closing the file using close() function.
The build-in .npy
file format is perfectly fine for working with small datasets, without relying on external modules other then numpy
.
However, when you start having large amounts of data, the use of a file format, such as HDF5, designed to handle such datasets, is to be preferred [1].
For instance, below is a solution to save numpy
arrays in HDF5 with PyTables,
Step 1: Create an extendable EArray
storage
import tables import numpy as np filename = 'outarray.h5' ROW_SIZE = 100 NUM_COLUMNS = 200 f = tables.open_file(filename, mode='w') atom = tables.Float64Atom() array_c = f.create_earray(f.root, 'data', atom, (0, ROW_SIZE)) for idx in range(NUM_COLUMNS): x = np.random.rand(1, ROW_SIZE) array_c.append(x) f.close()
Step 2: Append rows to an existing dataset (if needed)
f = tables.open_file(filename, mode='a') f.root.data.append(x)
Step 3: Read back a subset of the data
f = tables.open_file(filename, mode='r') print(f.root.data[1:10,2:20]) # e.g. read from disk only this part of the dataset
This is an expansion on Mohit Pandey's answer showing a full save / load example. It was tested using Python 3.6 and Numpy 1.11.3.
from pathlib import Path import numpy as np import os p = Path('temp.npy') with p.open('ab') as f: np.save(f, np.zeros(2)) np.save(f, np.ones(2)) with p.open('rb') as f: fsz = os.fstat(f.fileno()).st_size out = np.load(f) while f.tell() < fsz: out = np.vstack((out, np.load(f)))
out = array([[ 0., 0.], [ 1., 1.]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With