Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How come Python does not include a function to load a pickle from a file name?

Tags:

python

pickle

I often include this, or something close to it, in Python scripts and IPython notebooks.

import cPickle
def unpickle(filename):
    with open(filename) as f:
        obj = cPickle.load(f)
    return obj

This seems like a common enough use case that the standard library should provide a function that does the same thing. Is there such a function? If there isn't, how come?

like image 511
DGrady Avatar asked May 07 '15 23:05

DGrady


1 Answers

Most of the serialization libraries in the stdlib and on PyPI have a similar API. I'm pretty sure it was marshal that set the standard,* and pickle, json, PyYAML, etc. have just followed in its footsteps.

So, the question is, why was marshal designed that way?

Well, you obviously need loads/dumps; you couldn't build those on top of a filename-based function, and to build them on top of a file-object-based function you'd need StringIO, which didn't come until later.

You don't necessarily need load/dump, because those could be built on top of loads/dumps—but doing so could have major performance implications: you can't save anything to the file until you've built the whole thing in memory, and vice-versa, which could be a problem for huge objects.

You definitely don't need a loadf/dumpf function based on filenames, because those can be built trivially on top of load/dump, with no performance implications, and no tricky considerations that a user is likely to get wrong.

On the one hand, it would be convenient to have them anyway—and there are some libraries, like ElementTree, that do have analogous functions. It may only save a few seconds and a few lines per project, but multiply that by thousands of projects…

On the other hand, it would make Python larger. Not so much the extra 1K to download and install it if you added these two functions to every module (although that did mean a lot more back in the 1.x days than nowadays…), but more to document, more to learn, more to remember. And of course more code to maintain—every time you need to fix a bug in marshal.dumpf you have to remember to go check pickle.dumpf and json.dumpf to make sure they don't need the change, and sometimes you won't remember.

Balancing those two considerations is really a judgment call. One someone made decades ago and probably nobody has really discussed since. If you think there's a good case for changing it today, you can always post a feature request on the issue tracker or start a thread on python-ideas.


* Not in the original 1991 version of marshal.c; that just had load and dump. Guido added loads and dumps in 1993 as part of a change whose main description was "Add separate main program for the Mac: macmain.c". Presumably because something inside the Python interpreter needed to dump and load to strings.**

** marshal is used as the underpinnings for things like importing .pyc files. This also means (at least in CPython) it's not just implemented in C, but statically built into the core of the interpreter itself. While I think it actually could be turned into a regular module since the 3.4 import changes, but it definitely couldn't have back in the early days. So, that's extra motivation to keep it small and simple.

like image 72
abarnert Avatar answered Sep 28 '22 05:09

abarnert