Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy.savetxt() outputs very large files

I am using numpy.savetxt() to write a numpy array to a csv file, but the file that is generated is VERY large. For example, if I create a zeros array:

import numpy

test = numpy.zeros((10000,10000), dtype=numpy.float32)
numpy.savetxt('C:/datatest.csv',test,delimiter=',')

I would expect the file to be around 10,000*10,000*4 bytes (400 MB) large. (This is also what test.nbytes returns). However, the file is 2.3 GB large. Is there a reason for the large file size? I looked through the numpy documentation, there doesn't seem to be a way to specify the variable type when writing to a file. I tried other file types/delimiters, but get the same results.

like image 872
samuelschaefer Avatar asked Oct 09 '14 17:10

samuelschaefer


1 Answers

The size of the native datatype differs from the size of the string representation of the datatype.

numpy.savetxt has a fmt argument that defaults to '%.18e', which formats each of your zeros as 0.000000000000000000e+00. That is 24 characters per item plus one for the delimiter.

To get a smaller file you can change the format (beware of losing significant digits) or use numpy.save to save in binary or numpy.savez to save as a compressed archive.

like image 148
Steven Rumbalski Avatar answered Sep 22 '22 01:09

Steven Rumbalski