Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can we use a python variable to hold an entire file?

Provided that we know that all the file will be loaded in memory and we can afford it, what are the drawbacks (if any) or limitations (if any) of loading an entire file (possibly a binary file) in a python variable. If this is technically possible, should this be avoided, and why ?

Regarding file size concerns, to what maximum size this solution should be limited ?. And why ?

The actual loading code could be the one proposed in this stackoverflow entry.

Sample code is:

def file_get_contents(filename):
    with open(filename) as f:
        return f.read()

content = file_get_contents('/bin/kill')

... code manipulating 'content' ...

[EDIT] Code manipulation that comes to mind (but is maybe not applicable) is standard list/strings operators (square brackets, '+' signs) or some string operators ('len', 'in' operator, 'count', 'endswith'/'startswith', 'split', 'translation' ...).

like image 282
vaab Avatar asked Sep 16 '09 15:09

vaab


People also ask

Can we store a file in variable in Python?

We can use data. read() to read the file and store it in the variable name contents. One of the advantage of using with is that we don't need to close the file, it automatically closes file for you.

How much data can a Python variable hold?

Integers and Longs For whole numbers, Python offers two data types: integers and long integers. An integer's variable is referred to as int and is stored using at least 32 bits. This will likely meet most of your needs because it can hold any number from around negative 2 billion to positive 2 billion.

How do you make a variable accessible to all files in Python?

The best way to share global variables across modules across a single program is to create a config module. Just import the config module in all modules of your application; the module then becomes available as a global name.

What can you store in a variable Python?

Practical Data Science using Python Therefore, by assigning different data types to variables, you can store integers, decimals or characters in these variables.


2 Answers

  • Yes, you can
  • The only drawback is memory usage, and possible also speed if the file is big.
  • File size should be limited to how much space you have in memory.

In general, there are better ways to do it, but for one-off scripts where you know memory is not an issue, sure.

like image 129
Lennart Regebro Avatar answered Sep 22 '22 17:09

Lennart Regebro


While you've gotten good responses, it seems nobody has answered this part of your question (as often happens when you ask many questions in a question;-)...:

Regarding file size concerns, to what maximum size this solution should be limited ?. And why ?

The most important thing is, how much physical RAM can this specific Python process actually use (what's known as a "working set"), without unduly penalizing other aspects of the overall system's performance. If you exceed physical RAM for your "working set", you'll be paginating and swapping in and out to disk, and your performance can rapidly degrade (up to a state known as "thrashing" were basically all available cycles are going to the tasks of getting pages in and out, and negligible amounts of actual work can actually get done).

Out of that total, a reasonably modest amount (say a few MB at most, in general) are probably going to be taken up by executable code (Python's own executable files, DLLs or .so's) and bytecode and general support datastructures that are actively needed in memory; on a typical modern machine that's not doing other important or urgent tasks, you can almost ignore this overhead compared to the gigabytes of RAM that you have available overall (though the situation might be different on embedded systems, etc).

All the rest is available for your data -- which includes this file you're reading into memory, as well as any other significant data structures. "Modifications" of the file's data can typically take (transiently) twice as much memory as the file's contents' size (if you're holding it in a string) -- more, of course, if you're keeping a copy of the old data as well as making new modified copies/versions.

So for "read-only" use on a typical modern 32-bit machine with, say, 2GB of RAM overall, reading into memory (say) 1.5 GB should be no problem; but it will have to be substantially less than 1 GB if you're doing "modifications" (and even less if you have other significant data structures in memory!). Of course, on a dedicated server with a 64-bit build of Python, a 64-bit OS, and 16 GB of RAM, the practical limits before very different -- roughly in proportion to the vastly different amount of available RAM in fact.

For example, the King James' Bible text as downloadable here (unzipped) is about 4.4 MB; so, in a machine with 2 GB of RAM, you could keep about 400 slightly modified copies of it in memory (if nothing else is requesting memory), but, in a machine with 16 (available and addressable) GB of RAM, you could keep well over 3000 such copies.

like image 33
Alex Martelli Avatar answered Sep 23 '22 17:09

Alex Martelli