Python: Pre-loading memory

Tags:

I have a python program where I need to load and de-serialize a 1GB pickle file. It takes a good 20 seconds and I would like to have a mechanism whereby the content of the pickle is readily available for use. I've looked at shared_memory but all the examples of its use seem to involve numpy and my project doesn't use numpy. What is the easiest and cleanest way to achieve this using shared_memory or otherwise?

This is how I'm loading the data now (on every run):

def load_pickle(pickle_name):
    return pickle.load(open(DATA_ROOT + pickle_name, 'rb'))

I would like to be able to edit the simulation code in between runs without having to reload the pickle. I've been messing around with importlib.reload but it really doesn't seem to work well for a large Python program with many file:

def main():
    data_manager.load_data()
    run_simulation()
    while True:
        try:
            importlib.reload(simulation)
            run_simulation()
        except:
        print(traceback.format_exc())
        print('Press enter to re-run main.py, CTRL-C to exit')
        sys.stdin.readline()

933

asked Jun 08 '21 14:06

etayluz

Video Answer

3 Answers

Adding another assumption-challenging answer, it could be where you're reading your files from that makes a big difference

1G is not a great amount of data with today's systems; at 20 seconds to load, that's only 50MB/s, which is a fraction of what even the slowest disks provide

You may find you actually have a slow disk or some type of network share as your real bottleneck and that changing to a faster storage medium or compressing the data (perhaps with gzip) makes a great difference to read and writing

120

answered Oct 19 '22 21:10

ti7

An alternative to storing the unpickled data in memory would be to store the pickle in a ramdisk, so long as most of the time overhead comes from disk reads. Example code (to run in a terminal) is below.

sudo mkdir mnt/pickle
mount -o size=1536M -t tmpfs none /mnt/pickle
cp path/to/pickle.pkl mnt/pickle/pickle.pkl

Then you can access the pickle at mnt/pickle/pickle.pkl. Note that you can change the file names and extensions to whatever you want. If disk read is not the biggest bottleneck, you might not see a speed increase. If you run out of memory, you can try turning down the size of the ramdisk (I set it at 1536 mb, or 1.5gb)

answered Oct 19 '22 21:10

thshea

You can use shareable list: So you will have 1 python program running which will load the file and save it in memory and another python program which can take the file from memory. Your data, whatever is it you can load it in dictionary and then dump it as json and then reload json. So

Program1

import pickle
import json
from multiprocessing.managers import SharedMemoryManager
YOUR_DATA=pickle.load(open(DATA_ROOT + pickle_name, 'rb'))
data_dict={'DATA':YOUR_DATA}
data_dict_json=json.dumps(data_dict)
smm = SharedMemoryManager()
smm.start() 
sl = smm.ShareableList(['alpha','beta',data_dict_json])
print (sl)
#smm.shutdown() commenting shutdown now but you will need to do it eventually

The output will look like this

#OUTPUT
>>>ShareableList(['alpha', 'beta', "your data in json format"], name='psm_12abcd')

Now in Program2:

from multiprocessing import shared_memory
load_from_mem=shared_memory.ShareableList(name='psm_12abcd')
load_from_mem[1]
#OUTPUT
'beta'
load_from_mem[2]
#OUTPUT
yourdataindictionaryformat

You can look for more over here https://docs.python.org/3/library/multiprocessing.shared_memory.html

answered Oct 19 '22 21:10

ibadia

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: Pre-loading memory

Tags:

python

shared-memory

etayluz

People also ask

Video Answer

3 Answers

ti7

thshea

ibadia

Recent Activity

Donate For Us

Python: Pre-loading memory

Tags:

python

shared-memory

etayluz

People also ask

Video Answer

3 Answers

ti7

thshea

ibadia

Related questions

Recent Activity

Donate For Us