Right now I have a python program building a fairly large 2D numpy array and saving it as a tab delimited text file using numpy.savetxt. The numpy array contains only floats. I then read the file in one row at a time in a separate C++ program.
What I would like to do is find a way to accomplish this same task, changing my code as little as possible such that I can decrease the size of the file I am passing between the two programs.
I found that I can use numpy.savetxt to save to a compressed .gz file instead of a text file. This lowers the file size from ~2MB to ~100kB.
Is there a better way to do this? Could I, perhaps, write the numpy array in binary to the file to save space? If so, how would I do this so that I can still read it into the C++ program?
Thank you for the help. I appreciate any guidance I can get.
EDIT:
There are a lot of zeros (probably 70% of the values in the numpy array are 0.0000) I am not sure of how I can somehow exploit this though and generate a tiny file that my c++ program can read in
Since you have a lot of zeroes, you could only write out the non-zero elements in the form (index, number).
Suppose you have an array with a small amount of nonzero numbers:
In [5]: a = np.zeros((10, 10))
In [6]: a
Out[6]:
array([[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
In [7]: a[3,1] = 2.0
In [8]: a[7,4] = 17.0
In [9]: a[9,0] = 1.5
First, isolate the interesting numbers and their indices:
In [11]: x, y = a.nonzero()
In [12]: zip(x,y)
Out[12]: [(3, 1), (7, 4), (9, 0)]
In [13]: nonzero = zip(x,y)
Now you only have a small number of data elements left. The easiest thing is to write them to a text file:
In [17]: with open('numbers.txt', 'w+') as outf:
....: for r, k in nonzero:
....: outf.write('{:d} {:d} {:g}\n'.format(r, k, a[r,k]))
....:
In [18]: cat numbers.txt
3 1 2
7 4 17
9 0 1.5
This also gives you an opportunity to eyeball the data. In your C++ program you can read this data with fscanf
.
But you can reduce the size even more by writing binary data using struct:
In [17]: import struct
In [19]: c = struct.Struct('=IId')
In [20]: with open('numbers.bin', 'w+') as outf:
....: for r, k in nonzero:
....: outf.write(c.pack(r, k, a[r,k]))
The argument to the Struct
constructor means; use native date format '='. The first and second data elements are unsigned integers 'I', the third element is a double 'd'.
In your C++ program this data is probably best read as binary data into a packed struct
.
EDIT: Answer updated for a 2D array.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With