Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to decompress mongo journal files

As I have explored, journal files created by Mongodb is compressed using snappy compression algorithm. but I am not able to decompress this compressed journal file. It gives an error on trying to decompress

Error stream missing snappy identifier

the python code I have used to decompress is as follows:

import collections
import bson
from bson.codec_options import CodecOptions
import snappy
from cStringIO import StringIO
try:
    with open('journal/WiredTigerLog.0000000011') as f:
        content = f.readlines()
        fh = StringIO()
        snappy.stream_decompress(StringIO("".join(content)),fh)
        print fh
except Exception,e:
    print str(e)
    pass

please help i can't make my way after this

like image 740
stackMonk Avatar asked Feb 06 '17 11:02

stackMonk


People also ask

What are journal files in MongoDB?

With journaling enabled, MongoDB writes the in-memory changes first to on-disk journal files. If MongoDB should terminate or encounter an error before committing the changes to the data files, MongoDB can use the journal files to apply the write operation to the data files and maintain a consistent state.

Can I delete MongoDB Journal files?

The long answer: No, deleting the journal file isn't safe. The idea of journalling is this: A write comes in. Now, to make that write persistent (and the database durable), the write must somehow go to the disk.

How do I enable journaling in MongoDB?

For 64-bit builds of mongod, journaling is enabled by default. To enable journaling, start mongod with the --journal command line option.

How much time is the journal written again in MongoDB?

At every 100 milliseconds (See storage. journal. commitIntervalMs ). When WiredTiger creates a new journal file.


1 Answers

There's two forms of Snappy compression, the basic form and the streaming form. The basic form has the limitation that it all must fit in memory, so the streaming form exists to be able to compress larger amounts of data. The streaming format has a header and then subranges that are compressed. If the header is missing, it sounds like maybe you compressed using the basic form and are trying to uncompress with the streaming form. https://github.com/andrix/python-snappy/issues/40

If that is the case, use decompress instead of stream_decompress.

But if could be that the data isn't compressed at all:

with open('journal/WiredTigerLog.0000000011') as f:
    for line in f:
        print line

could work.

Minimum log record size for WiredTiger is 128 bytes. If a log record is 128 bytes or smaller, WiredTiger does not compress that record. https://docs.mongodb.com/manual/core/journaling/

like image 67
Hugues Fontenelle Avatar answered Oct 10 '22 20:10

Hugues Fontenelle