I'm writing a program which backs up a database using Python's RotatingFileHandler. This has two parameters, maxBytes
and backupCount
: the former is the maximum size of each log file, and the latter the maximum number of log files.
I would like to effectively never delete data, but still have each log file a certain size (say, 2 kB for the purpose of illustration). So I tried to set the backupCount
parameter to sys.maxint
:
import msgpack
import json
from faker import Faker
import logging
from logging.handlers import RotatingFileHandler
import os, glob
import itertools
import sys
fake = Faker()
fake.seed(0)
data_file = "my_log.log"
logger = logging.getLogger('my_logger')
logger.setLevel(logging.DEBUG)
handler = RotatingFileHandler(data_file, maxBytes=2000, backupCount=sys.maxint)
logger.addHandler(handler)
fake_dicts = [{'name': fake.name(), 'email': fake.email()} for _ in range(100)]
def dump(item, mode='json'):
if mode == 'json':
return json.dumps(item)
elif mode == 'msgpack':
return msgpack.packb(item)
mode = 'json'
# Generate the archive log
for item in fake_dicts:
dump_string = dump(item, mode=mode)
logger.debug(dump_string)
However, this leads to several MemoryError
s which look like this:
Traceback (most recent call last):
File "/usr/lib/python2.7/logging/handlers.py", line 77, in emit
self.doRollover()
File "/usr/lib/python2.7/logging/handlers.py", line 129, in doRollover
for i in range(self.backupCount - 1, 0, -1):
MemoryError
Logged from file json_logger.py, line 37
It seems like making this parameter large causes the system to use lots of memory, which is not desirable. Is there any way around this trade-off?
logger = logging.getLogger(__name__) This means that logger names track the package/module hierarchy, and it's intuitively obvious where events are logged just from the logger name. Sounds like good advice.
FileHandler. The FileHandler class, located in the core logging package, sends logging output to a disk file. It inherits the output functionality from StreamHandler . Returns a new instance of the FileHandler class.
An improvement to the solution suggested by @Asiel
Instead of using itertools
and os.path.exists
to determine what the nextName
should be in doRollOver
, the solution below simply remembers the number of last backup done and increments it to get the nextName
.
from logging.handlers import RotatingFileHandler
import os
class RollingFileHandler(RotatingFileHandler):
def __init__(self, filename, mode='a', maxBytes=0, backupCount=0, encoding=None, delay=False):
self.last_backup_cnt = 0
super(RollingFileHandler, self).__init__(filename=filename,
mode=mode,
maxBytes=maxBytes,
backupCount=backupCount,
encoding=encoding,
delay=delay)
# override
def doRollover(self):
if self.stream:
self.stream.close()
self.stream = None
# my code starts here
self.last_backup_cnt += 1
nextName = "%s.%d" % (self.baseFilename, self.last_backup_cnt)
self.rotate(self.baseFilename, nextName)
# my code ends here
if not self.delay:
self.stream = self._open()
This class will still save your backups in an ascendant order (ex. first backup will end with ".1", the second one will end with ".2", and so on). Modifying this to do also do gzip is straight forward.
The problem here is that RotatingFileHandler
is intended to... well rotate, and actually if you set its backupCount
to a big number the RotatingFileHandler.doRollover
method will loop in a backward range from backupCount-1
to zero trying to find the last created backup, the bigger the backupCount
the slower it will be (when you have an small number of backups)
Also the RotatingFileHandler
will keep renaming your backups which isn't necessary for what you want and actually it is an overhead, instead of simply putting your latest backup with the next ".n+1" extension it will rename all your backups and put the latest backup with the extension ".1" (will shift all backup names)
You could code the next class (probably with a better name):
from logging.handlers import RotatingFileHandler
import itertools
import os
class RollingFileHandler(RotatingFileHandler):
# override
def doRollover(self):
if self.stream:
self.stream.close()
self.stream = None
# my code starts here
for i in itertools.count(1):
nextName = "%s.%d" % (self.baseFilename, i)
if not os.path.exists(nextName):
self.rotate(self.baseFilename, nextName)
break
# my code ends here
if not self.delay:
self.stream = self._open()
This class will save your backups in an ascendant order (ex. first backup will end with ".1", the second one will end with ".2", and so on)
Since RollingFileHandler
extends RotatingFileHandler
you can simply replace RotatingFileHandler
for RollingFileHandler
in your code, you don't need to provide the backupCount
argument since it is ignored by this new class.
Since you will have an ever growing amount of log backups, you may want to compress them to save disk space. So you could create a class similar to RollingFileHandler
:
from logging.handlers import RotatingFileHandler
import gzip
import itertools
import os
import shutil
class RollingGzipFileHandler(RotatingFileHandler):
# override
def doRollover(self):
if self.stream:
self.stream.close()
self.stream = None
# my code starts here
for i in itertools.count(1):
nextName = "%s.%d.gz" % (self.baseFilename, i)
if not os.path.exists(nextName):
with open(self.baseFilename, 'rb') as original_log:
with gzip.open(nextName, 'wb') as gzipped_log:
shutil.copyfileobj(original_log, gzipped_log)
os.remove(self.baseFilename)
break
# my code ends here
if not self.delay:
self.stream = self._open()
This class will save your compressed backups with extensions ".1.gz", ".2.gz", etc. Also there are other compression algorithms available in the standard library if you don't want to use gzip
.
This is an old question, but hope this help.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With