Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Uncompressing a LZ4 blob with Perl

Tags:

sqlite

perl

lz4

I have a table in a SQLite db that stores blobs compressed with LZ4 algorithm. I am trying to use decompress/uncompress functions from Compress::LZ4, but not getting any success with it.

The sample SQLite db can be downloaded from here.

Here is how I am connecting to the SQLite db and getting the blob:

use DBI; 
use Data::Dump; 
use MIME::Base64; 
use Compress::LZ4;    

my $dbh = DBI->connect("dbi:SQLite:dbname=$ARGV[0]","","");   
$sth = $dbh->prepare("select blob_data from blob_parts where data_fk = 6");
$sth->execute();   
$result = $sth->fetch;   
$blob = $result->[0];

dd $blob;
dd (decompress($blob));

$sth->finish();  
$dbh->disconnect;

For the particular blob that I am selecting in this sample code (data_fk=6), dd outputs the following:

"LZ4\1>\1\0\0\xF7\xD6df\xF1mBXML\1\xA1\aVersion\xA1\4Type\xA1\2Id\xA1\3Ref\xA1\4Size\xA1\3use\xA1\4expr\xA1\5value\xA1\4data\xA1/Serialization\xA1\aPoints3\xA1\tuser_ E\0\xF0\16\bvertices\xA1\6double\xA1\bhas_attr\xA1\16\n\0\xC7object_ids\xA1\n\f\0\xF1M\4item\xA1\tis_active\xA0~B\20\n\22\6\4\x8C\1\0\0\0\6\2\xAA\24\6\0\xA4\x82\x88\2\x80\x82\x82\xA6B\26\6\b\x80\1B\30 \6\b\x88\2\3B\32\1\x93\6\0\0\0`\xACu\xCF\xBF\0\0\0\0\xCC\xF8\xC2?\0\0\0\0\0\x004\@\0\0\0 \xAA\xEF\xA9\20\x001h\xC5\xB1\b\0\xD0\0\0\$\@\1B\34\x85B\36\x87B C\0\xF0\aB\"\6\0\x88\3B\$\x85\1B\"\6\0\x88\3B\ $\x85\1\1\1"

But the decompress/uncompress functions just return undef. The uncompressed data should be something like (The following output is generated by a XML converter):

<?xml version="1.0" encoding="utf-8"?>
<MultiStreamDocument>
<!-- Stream 1 -->
<?xml version="1.0" encoding="utf-8"?>
<data xmlns="" Id="1" Type="Points3" Version="1 2 0 1 1">
    <user_data Size="0"></user_data>
    <vertices Size="2">
        <double>-0.24577860534191132</double>
        <double>0.14821767807006836</double>
        <double>20</double>
        <double>0.050656620413064957</double>
        <double>0.069418430328369141</double>
        <double>10</double>
    </vertices>
    <has_attr>false</has_attr>
    <has_object_ids>true</has_object_ids>
    <object_ids Size="2">
        <item Version="3">
            <is_active>false</is_active>
        </item>
        <item Version="3">
            <is_active>false</is_active>
        </item>
    </object_ids>
</data><!-- Stream size: 126 bytes -->
</MultiStreamDocument>

What is the correct way to get uncompressed blob data from this SQLite database?

like image 828
user2006190 Avatar asked Oct 31 '22 12:10

user2006190


1 Answers

You data looks like it is LZ4-compressed and prefixed with the four bytes "LZ4\1" presumably as a format indicator

The next four bytes ">\1\0\0" are a little-endian original-size field which evaluates to 318 bytes, which is reasonable. The decompress library function expects this field

So in theory, you should be able to write

$blob = substr($blob(4);
dd decompress($blob);

and get the correct result. However this also results in a value of undef for me, which suggests that the data is corrupted somehow

What is certain is that most of the data has ended up uncompressed. The two bytes following the length field are "\xF7\xD6", which indicates that the data following that is 229 bytes of literal data (the upper nybl of the first byte - 0xF - plus the second byte - 0xD6 - is 0xE5 or 229). So this part of the data

"df\xF1mBXML\1\xA1\aVersion\xA1\4Type\xA1\2Id\xA1\3Ref\xA1\4Size\xA1\3use\xA1\4expr\xA1\5value\xA1\4data\xA1/http://www.slb.com/Petrel/2011/03/Serialization\xA1\aPoints3\xA1\tuser_E\0\xF0\16\bvertices\xA1\6double\xA1\bhas_attr\xA1\16\n\0\xC7object_ids\xA1\n\f\0\xF1M\4item\xA1\tis_active\xA0~B\20\n\22\6\4\x8C\1\0\0\0\6\2\xAA\24\6\0\xA4\x82\x88\2\x80\x82\x82\xA6B\26\6\b\x80\1"

is literal, as could be guessed by the amount of readable text it contains

The following two bytes, "B\30" should indicate an offset within the translated buffer from which data should be copied. Unfortunately this evaluates to 6210, whereas, as we have seen, the buffer is only 229 bytes long so far. This is presumably where the data causes the decompress function to balk and return undef

That's the best I can make of your data. I hope it helps

like image 149
user75857 Avatar answered Nov 15 '22 11:11

user75857