Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pickle vs output to a file in python

I have a program that outputs some lists that I want to store to work with later. For example, suppose it outputs a list of student names and another list of their midterm scores. I can store this output in the following two ways:

Standard File Output way:

newFile = open('trialWrite1.py','w')
newFile.write(str(firstNames))
newFile.write(str(midterm1Scores))
newFile.close()

The pickle way:

newFile = open('trialWrite2.txt','w')
cPickle.dump(firstNames, newFile)
cPickle.dump(midterm1Scores, newFile)
newFile.close()

Which technique is better or preferred? Is there an advantage of using one over the other?

Thanks

like image 488
Curious2learn Avatar asked Aug 27 '10 20:08

Curious2learn


People also ask

Why do we use pickle file in Python?

Pickle in Python is primarily used in serializing and deserializing a Python object structure. In other words, it's the process of converting a Python object into a byte stream to store it in a file/database, maintain program state across sessions, or transport data over the network.

Why pickle is faster than CSV?

The advantage of pickle is that it allows the python code to implement any type of enhancements. It is much faster when compared to CSV files and reduces the file size to almost half of CSV files using its compression techniques. Also, there is no need to specify multiple parameters like CSV for each data column.

Is pickle more efficient than JSON?

JSON is a lightweight format and is much faster than Pickling. There is always a security risk with Pickle. Unpickling data from unknown sources should be avoided as it may contain malicious or erroneous data. There are no loopholes in security using JSON, and it is free from security threats.


2 Answers

I think the csv module might be a good fit here, since CSV is a standard format that can be both read and written by Python (and many other languages), and it's also human-readable. Usage could be as simple as

with open('trialWrite1.py','wb') as fileobj:
    newFile = csv.writer(fileobj)
    newFile.writerow(firstNames)
    newFile.writerow(midterm1Scores)

However, it'd probably make more sense to write one student per row, including their name and score. That can be done like this:

from itertools import izip
with open('trialWrite1.py','wb') as fileobj:
    newFile = csv.writer(fileobj)
    for row in izip(firstNames, midterm1Scores):
        newFile.writerow(row)
like image 113
David Z Avatar answered Oct 23 '22 04:10

David Z


pickle is more generic -- it allows you to dump many different kinds of objects to a file for later use. The downside is that the interim storage is not very human-readable, and not in a standard format.

Writing strings to a file, on the other hand, is a much better interface to other activities or code. But it comes at the cost of having to parse the text back into your Python object again.

Both are fine for this simple (list?) data; I would use write( firstNames ) simply because there's no need to use pickle. In general, how to persist your data to the filesystem depends on the data!


For instance, pickle will happily pickle functions, which you can't do by simply writing the string representations.

>>> data = range
<class 'range'>
>>> pickle.dump( data, foo )
# stuff
>>> pickle.load( open( ..., "rb" ) )
<class 'range'.
like image 3
Katriel Avatar answered Oct 23 '22 06:10

Katriel