Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In the logging module's RotatingFileHandler, how to set the backupCount to a practically infinite number

Tags:

python

I'm writing a program which backs up a database using Python's RotatingFileHandler. This has two parameters, maxBytes and backupCount: the former is the maximum size of each log file, and the latter the maximum number of log files.

I would like to effectively never delete data, but still have each log file a certain size (say, 2 kB for the purpose of illustration). So I tried to set the backupCount parameter to sys.maxint:

import msgpack
import json
from faker import Faker
import logging
from logging.handlers import RotatingFileHandler
import os, glob
import itertools
import sys

fake = Faker()
fake.seed(0)

data_file = "my_log.log"

logger = logging.getLogger('my_logger')
logger.setLevel(logging.DEBUG)
handler = RotatingFileHandler(data_file, maxBytes=2000, backupCount=sys.maxint)
logger.addHandler(handler)

fake_dicts = [{'name': fake.name(), 'email': fake.email()} for _ in range(100)]

def dump(item, mode='json'):
    if mode == 'json':
        return json.dumps(item)
    elif mode == 'msgpack':
        return msgpack.packb(item)

mode = 'json'

# Generate the archive log
for item in fake_dicts:
    dump_string = dump(item, mode=mode)
    logger.debug(dump_string)

However, this leads to several MemoryErrors which look like this:

Traceback (most recent call last):
  File "/usr/lib/python2.7/logging/handlers.py", line 77, in emit
    self.doRollover()
  File "/usr/lib/python2.7/logging/handlers.py", line 129, in doRollover
    for i in range(self.backupCount - 1, 0, -1):
MemoryError
Logged from file json_logger.py, line 37

It seems like making this parameter large causes the system to use lots of memory, which is not desirable. Is there any way around this trade-off?

like image 232
Kurt Peek Avatar asked Oct 20 '16 09:10

Kurt Peek


People also ask

What is logging getLogger (__ Name __)?

logger = logging.getLogger(__name__) This means that logger names track the package/module hierarchy, and it's intuitively obvious where events are logged just from the logger name. Sounds like good advice.

What is FileHandler in logging?

FileHandler. The FileHandler class, located in the core logging package, sends logging output to a disk file. It inherits the output functionality from StreamHandler . Returns a new instance of the FileHandler class.


2 Answers

An improvement to the solution suggested by @Asiel

Instead of using itertools and os.path.exists to determine what the nextName should be in doRollOver, the solution below simply remembers the number of last backup done and increments it to get the nextName.

from logging.handlers import RotatingFileHandler
import os
class RollingFileHandler(RotatingFileHandler):

    def __init__(self, filename, mode='a', maxBytes=0, backupCount=0, encoding=None, delay=False):
        self.last_backup_cnt = 0
        super(RollingFileHandler, self).__init__(filename=filename,
                                                 mode=mode,
                                                 maxBytes=maxBytes,
                                                 backupCount=backupCount,
                                                 encoding=encoding,
                                                 delay=delay)

    # override
    def doRollover(self):
        if self.stream:
            self.stream.close()
            self.stream = None
        # my code starts here
        self.last_backup_cnt += 1
        nextName = "%s.%d" % (self.baseFilename, self.last_backup_cnt)
        self.rotate(self.baseFilename, nextName)
        # my code ends here
        if not self.delay:
            self.stream = self._open()

This class will still save your backups in an ascendant order (ex. first backup will end with ".1", the second one will end with ".2", and so on). Modifying this to do also do gzip is straight forward.

like image 183
SBK Avatar answered Oct 14 '22 06:10

SBK


The problem here is that RotatingFileHandler is intended to... well rotate, and actually if you set its backupCount to a big number the RotatingFileHandler.doRollover method will loop in a backward range from backupCount-1 to zero trying to find the last created backup, the bigger the backupCount the slower it will be (when you have an small number of backups)

Also the RotatingFileHandler will keep renaming your backups which isn't necessary for what you want and actually it is an overhead, instead of simply putting your latest backup with the next ".n+1" extension it will rename all your backups and put the latest backup with the extension ".1" (will shift all backup names)


Solution:

You could code the next class (probably with a better name):

from logging.handlers import RotatingFileHandler
import itertools
import os
class RollingFileHandler(RotatingFileHandler):
    # override
    def doRollover(self):
        if self.stream:
            self.stream.close()
            self.stream = None
        # my code starts here
        for i in itertools.count(1):
            nextName = "%s.%d" % (self.baseFilename, i)
            if not os.path.exists(nextName):
                self.rotate(self.baseFilename, nextName)
                break
        # my code ends here
        if not self.delay:
            self.stream = self._open()

This class will save your backups in an ascendant order (ex. first backup will end with ".1", the second one will end with ".2", and so on)

Since RollingFileHandler extends RotatingFileHandler you can simply replace RotatingFileHandler for RollingFileHandler in your code, you don't need to provide the backupCount argument since it is ignored by this new class.


Bonus solution: (compressing backups)

Since you will have an ever growing amount of log backups, you may want to compress them to save disk space. So you could create a class similar to RollingFileHandler:

from logging.handlers import RotatingFileHandler
import gzip
import itertools
import os
import shutil
class RollingGzipFileHandler(RotatingFileHandler):
    # override
    def doRollover(self):
        if self.stream:
            self.stream.close()
            self.stream = None
        # my code starts here
        for i in itertools.count(1):
            nextName = "%s.%d.gz" % (self.baseFilename, i)
            if not os.path.exists(nextName):
                with open(self.baseFilename, 'rb') as original_log:
                    with gzip.open(nextName, 'wb') as gzipped_log:
                        shutil.copyfileobj(original_log, gzipped_log)
                os.remove(self.baseFilename)
                break
        # my code ends here
        if not self.delay:
            self.stream = self._open()

This class will save your compressed backups with extensions ".1.gz", ".2.gz", etc. Also there are other compression algorithms available in the standard library if you don't want to use gzip.

This is an old question, but hope this help.

like image 4
Asiel Diaz Benitez Avatar answered Oct 14 '22 05:10

Asiel Diaz Benitez