Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

more efficient way to pickle a string

The pickle module seems to use string escape characters when pickling; this becomes inefficient e.g. on numpy arrays. Consider the following

z = numpy.zeros(1000, numpy.uint8)
len(z.dumps())
len(cPickle.dumps(z.dumps()))

The lengths are 1133 characters and 4249 characters respectively.

z.dumps() reveals something like "\x00\x00" (actual zeros in string), but pickle seems to be using the string's repr() function, yielding "'\x00\x00'" (zeros being ascii zeros).

i.e. ("0" in z.dumps() == False) and ("0" in cPickle.dumps(z.dumps()) == True)

like image 686
gatoatigrado Avatar asked Mar 30 '09 01:03

gatoatigrado


1 Answers

Try using a later version of the pickle protocol with the protocol parameter to pickle.dumps(). The default is 0 and is an ASCII text format. Ones greater than 1 (I suggest you use pickle.HIGHEST_PROTOCOL). Protocol formats 1 and 2 (and 3 but that's for py3k) are binary and should be more space conservative.

like image 50
Benjamin Peterson Avatar answered Oct 07 '22 12:10

Benjamin Peterson