I need to read firefox's indexeddb using python.
I use slite3
package to retrieve contents of indexeddb:
with sqlite3.connect(indexeddb_file) as conn:
c = conn.cursor()
c.execute('select * from object_data;')
rows = c.fetchall()
for row in rows:
print row[2]
However, although I know that contents in database are strings, they are stored as sqlite binary blobs. Is there a way to read the strings stored as blobs from python?
I've tried:
UPDATE
Following the coding scheme in firefox source code of the implementation of indexeddb pointed out by @paa in one of the comments of this question, I implemented part of FF encoding method for database keys in python. So, far I have implemented it only for strings but implementing it for other types would be even easier:
BYTE_LENGTH = 8
def hex_to_bin(hex_str):
"""Return binary representation of hexadecimal string."""
return str(trim_bin(int(hex_str, 16)).zfill(len(hex_str) * 4))
def byte_to_unicode(bin_byte):
"""Return unicode encoding for binary byte."""
return chr(int(str(bin_byte), 2))
def trim_bin(int_n):
"""Return int num converted to trimmed bin representation."""
return bin(int_n)[2:]
def decode(key):
"""Return decoded idb key."""
decoded = key
m = re.search("[1-9]", key) # change for non-zero
if m:
i = m.start()
typeoffset = int(key[i])
else:
# error
pass
data = key[i + 1:]
if typeoffset is 1:
# decode number
pass
elif typeoffset is 2:
# decode date
pass
elif typeoffset is 3:
# decode string
bin_repr = hex_to_bin(data)
decoded = ""
for i in xrange(0, len(bin_repr), BYTE_LENGTH):
byte = bin_repr[i:i + BYTE_LENGTH]
if byte[0] is '0':
byte_1 = int(byte, 2) - 1
decoded += byte_to_unicode(trim_bin(byte_1))
else:
byte = byte[2:]
if byte[1] is '0':
byte_127 = int(byte, 2) + 127
decoded += byte_to_unicode(trim_bin(byte_127))
i += BYTE_LENGTH
decoded += byte_to_unicode(bin_repr[i:i + BYTE_LENGTH])
elif byte[1] is '1':
decoded += byte_to_unicode(byte)
i += BYTE_LENGTH
decoded += byte_to_unicode(bin_repr[i:i + BYTE_LENGTH])
i += BYTE_LENGTH
decoded += byte_to_unicode(bin_repr[i:i + 2])
return decoded
elif typeoffset is 4:
# decode array
pass
else:
# error
pass
return decoded
However, I'm still not able to decode the data fields of indexeddb. It seems to me that they are not using any sophisticated scheme like the one for the keys because I can read some parts of the actual values when I encode them in UTF-16.
(Typing here since I can't comment yet...)
For data itself I've been trying to do the same thing for data blobs. For my problem, I'm trying to grab JSON strings out. If I look at the DB I'm trying to sift through, I do see UTF-16 encoded characters, most of the time. But there are strange cases where I have this:
"there we go" is encoded as 7400 6800 6500 7200 6500 2000 77 [05060C] 6700 6F00. The [05060C] supposedly encodes "e ".
https://mxr.mozilla.org/mozilla-release/source/dom/indexedDB/IDBObjectStore.cpp
I'm trying to look into that and see if there are any clues. Should be plenty of other source files in the directory that could help.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With