Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to format in numpy savetxt such that zeros are saved only as "0"

Tags:

python

numpy

I am saving a numpy sparse array (densed) into a csv. The result is I have a 3GB csv. The problem is 95% of the cells are 0.0000. I used fmt='%5.4f'. How can I format and save such that the zeros are saved only as 0 and the non zero floats are saved with the '%5.4f' format ? I am sure I can get the 3GB down to 300MB if I can do this.

I am using

np.savetxt('foo.csv', arrayDense, fmt='%5.4f', delimiter = ',')

Thanks Regards

like image 942
Run2 Avatar asked Jul 11 '14 06:07

Run2


People also ask

What is FMT in Python?

fmtstr or sequence of strs, optional. A single format (%10.5f), a sequence of formats, or a multi-format string, e.g. 'Iteration %d – %10.5f', in which case delimiter is ignored. For complex X, the legal options for fmt are: a single specifier, fmt='%. 4e', resulting in numbers formatted like ' (%s+%sj)' % (fmt, fmt)

Does NumPy save overwrite?

savetxt will overwrite the original file. New! Save questions or answers and organize your favorite content.


1 Answers

If you look at the source code of np.savetxt, you'll see that, while there is quite a bit of code to handle the arguments and the differences between Python 2 and Python 3, it is ultimately a simple python loop over the rows, in which each row is formatted and written to the file. So you won't lose any performance if you write your own. For example, here's a pared down function that writes compact zeros:

def savetxt_compact(fname, x, fmt="%.6g", delimiter=','):
    with open(fname, 'w') as fh:
        for row in x:
            line = delimiter.join("0" if value == 0 else fmt % value for value in row)
            fh.write(line + '\n')

For example:

In [70]: x
Out[70]: 
array([[ 0.        ,  0.        ,  0.        ,  0.        ,  1.2345    ],
       [ 0.        ,  9.87654321,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  3.14159265,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ]])

In [71]: savetxt_compact('foo.csv', x, fmt='%.4f')

In [72]: !cat foo.csv
0,0,0,0,1.2345
0,9.8765,0,0,0
0,3.1416,0,0,0
0,0,0,0,0
0,0,0,0,0
0,0,0,0,0

Then, as long as you are writing your own savetxt function, you might as well make it handle sparse matrices, so you don't have to convert it to a (dense) numpy array before saving it. (I assume the sparse array is implemented using one of the sparse representations from scipy.sparse.) In the following function, the only change is from ... for value in row to ... for value in row.A[0].

def savetxt_sparse_compact(fname, x, fmt="%.6g", delimiter=','):
    with open(fname, 'w') as fh:
        for row in x:
            line = delimiter.join("0" if value == 0 else fmt % value for value in row.A[0])
            fh.write(line + '\n')

Example:

In [112]: a
Out[112]: 
<6x5 sparse matrix of type '<type 'numpy.float64'>'
    with 3 stored elements in Compressed Sparse Row format>

In [113]: a.A
Out[113]: 
array([[ 0.        ,  0.        ,  0.        ,  0.        ,  1.2345    ],
       [ 0.        ,  9.87654321,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  3.14159265,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ]])

In [114]: savetxt_sparse_compact('foo.csv', a, fmt='%.4f')

In [115]: !cat foo.csv
0,0,0,0,1.2345
0,9.8765,0,0,0
0,3.1416,0,0,0
0,0,0,0,0
0,0,0,0,0
0,0,0,0,0
like image 200
Warren Weckesser Avatar answered Sep 22 '22 12:09

Warren Weckesser