Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I access Firefox's internal indexedDB files using Python?

I need to read firefox's indexeddb using python.

I use slite3 package to retrieve contents of indexeddb:

with sqlite3.connect(indexeddb_file) as conn:
    c = conn.cursor()
    c.execute('select * from object_data;')
    rows = c.fetchall()
    for row in rows:
        print row[2]

However, although I know that contents in database are strings, they are stored as sqlite binary blobs. Is there a way to read the strings stored as blobs from python?

I've tried:

  • hex() and quote() sql methods just encode the blob to hexadecimal
  • the same problem when I write the blob to file

UPDATE

Following the coding scheme in firefox source code of the implementation of indexeddb pointed out by @paa in one of the comments of this question, I implemented part of FF encoding method for database keys in python. So, far I have implemented it only for strings but implementing it for other types would be even easier:

BYTE_LENGTH = 8

def hex_to_bin(hex_str):
    """Return binary representation of hexadecimal string."""
    return str(trim_bin(int(hex_str, 16)).zfill(len(hex_str) * 4))

def byte_to_unicode(bin_byte):
    """Return unicode encoding for binary byte."""
    return chr(int(str(bin_byte), 2))

def trim_bin(int_n):
    """Return int num converted to trimmed bin representation."""
    return bin(int_n)[2:]

def decode(key):
    """Return decoded idb key."""
    decoded = key
    m = re.search("[1-9]", key)  # change for non-zero
    if m:
        i = m.start()
        typeoffset = int(key[i])
    else:
        # error
        pass
    data = key[i + 1:]
    if typeoffset is 1:
        # decode number
        pass
    elif typeoffset is 2:
        # decode date
        pass
    elif typeoffset is 3:
        # decode string
        bin_repr = hex_to_bin(data)
        decoded = ""
        for i in xrange(0, len(bin_repr), BYTE_LENGTH):
            byte = bin_repr[i:i + BYTE_LENGTH]
            if byte[0] is '0':
                byte_1 = int(byte, 2) - 1
                decoded += byte_to_unicode(trim_bin(byte_1))
            else:
                byte = byte[2:]
                if byte[1] is '0':
                    byte_127 = int(byte, 2) + 127
                    decoded += byte_to_unicode(trim_bin(byte_127))
                    i += BYTE_LENGTH
                    decoded += byte_to_unicode(bin_repr[i:i + BYTE_LENGTH])
                elif byte[1] is '1':
                    decoded += byte_to_unicode(byte)
                    i += BYTE_LENGTH
                    decoded += byte_to_unicode(bin_repr[i:i + BYTE_LENGTH])
                    i += BYTE_LENGTH
                    decoded += byte_to_unicode(bin_repr[i:i + 2])
        return decoded
    elif typeoffset is 4:
        # decode array
        pass
    else:
        # error
        pass
    return decoded

However, I'm still not able to decode the data fields of indexeddb. It seems to me that they are not using any sophisticated scheme like the one for the keys because I can read some parts of the actual values when I encode them in UTF-16.

like image 390
synack Avatar asked Apr 07 '14 15:04

synack


1 Answers

(Typing here since I can't comment yet...)

For data itself I've been trying to do the same thing for data blobs. For my problem, I'm trying to grab JSON strings out. If I look at the DB I'm trying to sift through, I do see UTF-16 encoded characters, most of the time. But there are strange cases where I have this:

"there we go" is encoded as 7400 6800 6500 7200 6500 2000 77 [05060C] 6700 6F00. The [05060C] supposedly encodes "e ".

https://mxr.mozilla.org/mozilla-release/source/dom/indexedDB/IDBObjectStore.cpp

I'm trying to look into that and see if there are any clues. Should be plenty of other source files in the directory that could help.

like image 77
Shuffy Avatar answered Oct 31 '22 16:10

Shuffy