I just tried a Python shelve module as the persistent cache for data fetched from the external service. The complete example is here.
I was wondering what would the best approach if I want to make this multiprocess safe? I am aware of redis, memcached and such "real solutions", but I'd like to use only the parts of Python standard library or very minimal dependencies to keep my code compact and not introduce unnecessary complexity when running the code in single process - single thread model.
It's easy to come up with a single-process solution, but this does not work well current Python web run-times. Specifically, the problem would be that in Apache + mod_wsgi enviroment
Only one process is updating the cached data once (file locks, somehow?)
Other processes use the cached data while the update is under way
If the process fails to update the cached data there is penalty of N minutes before another process can try again (to prevent thundering herd and such) - how to signal this between mod_wsgi processes
You do not utilize any "heavy tools" for this, only Python standard libraries and UNIX
Also if some PyPi package does this without external dependencies let me know of it please. Alternative approaches and recommendations, like "just use sqlite" are welcome.
Example:
import datetime
import os
import shelve
import logging
logger = logging.getLogger(__name__)
class Converter:
def __init__(self, fpath):
self.last_updated = None
self.data = None
self.data = shelve.open(fpath)
if os.path.exists(fpath):
self.last_updated = datetime.datetime.fromtimestamp(os.path.getmtime(fpath))
def convert(self, source, target, amount, update=True, determiner="24h_avg"):
# Do something with cached data
pass
def is_up_to_date(self):
if not self.last_updated:
return False
return datetime.datetime.now() < self.last_updated + self.refresh_delay
def update(self):
try:
# Update data from the external server
self.last_updated = datetime.datetime.now()
self.data.sync()
except Exception as e:
logger.error("Could not refresh market data: %s %s", self.api_url, e)
logger.exception(e)
Implementing a Cache Using a Python Dictionary You can use the article's URL as the key and its content as the value. Save this code to a caching.py file, install the requests library, then run the script: $ pip install requests $ python caching.py Getting article... Fetching article from server...
Python multiprocessing Process class is an abstraction that sets up another Python process, provides it to run code and a way for the parent application to control execution. There are two important functions that belongs to the Process class - start() and join() function.
multiprocessing is a drop in replacement for Python's multiprocessing module. It supports the exact same operations, but extends it, so that all tensors sent through a multiprocessing. Queue , will have their data moved into shared memory and will only send a handle to another process.
I'd say you'd want to use some existing caching library, dogpile.cache
comes to mind, it has many features already, and you can easily plug in the backends you might need.
dogpile.cache
documentation tells the following:
This “get-or-create” pattern is the entire key to the “Dogpile” system, which coordinates a single value creation operation among many concurrent get operations for a particular key, eliminating the issue of an expired value being redundantly re-generated by many workers simultaneously.
Let's consider your requirements systematically:
Your use case will determine if you can use in-band (file descriptor or memory region inherited across fork) or out-of-band synchronisation (posix file locks, sys V shared memory).
Then you may have other requirements, e.g. cross-platform availability of the tools, etc.
There really isn't that much in the standard library, except bare tools. One module however, stands out, sqlite3
. Sqlite uses fcntl/posix locks, there are performance limitations though, multiple processes imply file-backed database, and sqlite requires fdatasync on commit.
Thus there's a limit to transactions/s in sqlite imposed by your hard drive rpm. The latter is not a big deal if you have hw raid, but can be a major handicap on commodity hardware, e.g. a laptop or usb flash or sd card. Plan for ~100tps if you use a regular, rotating hard drive.
Your processes can also block on sqlite, if you use special transaction modes.
There are two major approaches for this:
Presumably if you trust another process with the cache value, you don't have any security considerations. Thus either will work, or perhaps a combination of both.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With