I am saving a numpy
sparse array (densed) into a csv. The result is I have a 3GB csv. The problem is 95% of the cells are 0.0000. I used fmt='%5.4f'
. How can I format and save such that the zeros are saved only as 0 and the non zero floats are saved with the '%5.4f'
format ? I am sure I can get the 3GB down to 300MB if I can do this.
I am using
np.savetxt('foo.csv', arrayDense, fmt='%5.4f', delimiter = ',')
Thanks Regards
fmtstr or sequence of strs, optional. A single format (%10.5f), a sequence of formats, or a multi-format string, e.g. 'Iteration %d – %10.5f', in which case delimiter is ignored. For complex X, the legal options for fmt are: a single specifier, fmt='%. 4e', resulting in numbers formatted like ' (%s+%sj)' % (fmt, fmt)
savetxt will overwrite the original file. New! Save questions or answers and organize your favorite content.
If you look at the source code of np.savetxt
, you'll see that, while there is quite a bit of code to handle the arguments and the differences between Python 2 and Python 3, it is ultimately a simple python loop over the rows, in which each row is formatted and written to the file. So you won't lose any performance if you write your own. For example, here's a pared down function that writes compact zeros:
def savetxt_compact(fname, x, fmt="%.6g", delimiter=','):
with open(fname, 'w') as fh:
for row in x:
line = delimiter.join("0" if value == 0 else fmt % value for value in row)
fh.write(line + '\n')
For example:
In [70]: x
Out[70]:
array([[ 0. , 0. , 0. , 0. , 1.2345 ],
[ 0. , 9.87654321, 0. , 0. , 0. ],
[ 0. , 3.14159265, 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ]])
In [71]: savetxt_compact('foo.csv', x, fmt='%.4f')
In [72]: !cat foo.csv
0,0,0,0,1.2345
0,9.8765,0,0,0
0,3.1416,0,0,0
0,0,0,0,0
0,0,0,0,0
0,0,0,0,0
Then, as long as you are writing your own savetxt
function, you might as well make it handle sparse matrices, so you don't have to convert it to a (dense) numpy array before saving it. (I assume the sparse array is implemented using one of the sparse representations from scipy.sparse
.) In the following function, the only change is from ... for value in row
to ... for value in row.A[0]
.
def savetxt_sparse_compact(fname, x, fmt="%.6g", delimiter=','):
with open(fname, 'w') as fh:
for row in x:
line = delimiter.join("0" if value == 0 else fmt % value for value in row.A[0])
fh.write(line + '\n')
Example:
In [112]: a
Out[112]:
<6x5 sparse matrix of type '<type 'numpy.float64'>'
with 3 stored elements in Compressed Sparse Row format>
In [113]: a.A
Out[113]:
array([[ 0. , 0. , 0. , 0. , 1.2345 ],
[ 0. , 9.87654321, 0. , 0. , 0. ],
[ 0. , 3.14159265, 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ]])
In [114]: savetxt_sparse_compact('foo.csv', a, fmt='%.4f')
In [115]: !cat foo.csv
0,0,0,0,1.2345
0,9.8765,0,0,0
0,3.1416,0,0,0
0,0,0,0,0
0,0,0,0,0
0,0,0,0,0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With