As I have explored, journal files created by Mongodb is compressed using snappy compression algorithm. but I am not able to decompress this compressed journal file. It gives an error on trying to decompress <blockquote> Error stream missing snappy identifier </blockquote> the python code I have used to decompress is as follows: <pre class="prettyprint"><code>import collections import bson from bson.codec_options import CodecOptions import snappy from cStringIO import StringIO try: with open('journal/WiredTigerLog.0000000011') as f: content = f.readlines() fh = StringIO() snappy.stream_decompress(StringIO("".join(content)),fh) print fh except Exception,e: print str(e) pass </code></pre> please help i can't make my way after this

<blockquote> There's two forms of Snappy compression, the basic form and the streaming form. The basic form has the limitation that it all must fit in memory, so the streaming form exists to be able to compress larger amounts of data. The streaming format has a header and then subranges that are compressed. If the header is missing, it sounds like maybe you compressed using the basic form and are trying to uncompress with the streaming form. https://github.com/andrix/python-snappy/issues/40 </blockquote> If that is the case, use <code>decompress</code> instead of <code>stream_decompress</code>. But if could be that the data isn't compressed at all: <pre class="prettyprint"><code>with open('journal/WiredTigerLog.0000000011') as f: for line in f: print line </code></pre> could work. <blockquote> Minimum log record size for WiredTiger is 128 bytes. If a log record is 128 bytes or smaller, WiredTiger does not compress that record. https://docs.mongodb.com/manual/core/journaling/ </blockquote>

How to decompress mongo journal files

Error stream missing snappy identifier

the python code I have used to decompress is as follows:

Click to copy

import collections
import bson
from bson.codec_options import CodecOptions
import snappy
from cStringIO import StringIO
try:
    with open('journal/WiredTigerLog.0000000011') as f:
        content = f.readlines()
        fh = StringIO()
        snappy.stream_decompress(StringIO("".join(content)),fh)
        print fh
except Exception,e:
    print str(e)
    pass

please help i can't make my way after this

740

asked Feb 06 '17 11:02

stackMonk

1 Answers

There's two forms of Snappy compression, the basic form and the streaming form. The basic form has the limitation that it all must fit in memory, so the streaming form exists to be able to compress larger amounts of data. The streaming format has a header and then subranges that are compressed. If the header is missing, it sounds like maybe you compressed using the basic form and are trying to uncompress with the streaming form. https://github.com/andrix/python-snappy/issues/40

If that is the case, use decompress instead of stream_decompress.

But if could be that the data isn't compressed at all:

Click to copy

with open('journal/WiredTigerLog.0000000011') as f:
    for line in f:
        print line

could work.

Minimum log record size for WiredTiger is 128 bytes. If a log record is 128 bytes or smaller, WiredTiger does not compress that record. https://docs.mongodb.com/manual/core/journaling/

answered Oct 10 '22 20:10

Hugues Fontenelle

Related questions
                            
                                How to speed up scrolling responsiveness when displaying lots of text
                            
                                Meaning of ldexp and frexp?
                            
                                Paginating a DynamoDB query in boto3
                            
                                Pycharm conda env not showing packages installed via pip
                            
                                What's the difference between Celery task and subtask?
                            
                                Installing librdkafka on Windows to support Python development
                            
                                How to remove deferred attribute of SQLAlchemy entity from memory?
                            
                                Updating an old system to Q-learning with Neural Networks
                            
                                Interactive debugging in IPython (Jupyter) notebook
                            
                                Python Pandas Series failure datetime
                            
                                How to Mock Import for Read the Docs?
                            
                                Adding element to a dictionary in python?
                            
                                Call another setup.py in setup.py
                            
                                Python 2.7 Cx_Freeze: ImportError: No module named __startup__
                            
                                How to include the default TEMPLATE_CONTEXT_PROCESSORS in the new TEMPLATES setting in Django 1.10
                            
                                imageio: How to increase quality of output gifs?
                            
                                python script crashes after long time running
                            
                                Tensorflow training becomes slower and slower when iteration is more than 10,000. Why?
                            
                                Pandas: Write CSV file with Windows line ending
                            
                                Paragraph Segmentation using Machine Learning

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to decompress mongo journal files

Tags:

python

mongodb

snappy

journal

stackMonk

People also ask

1 Answers

Hugues Fontenelle

Recent Activity

Donate For Us