Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings

Tags:

python

sqlite

I am writing an script which scans a directory recursively and store them in a dictionary which is a collection of list. This list in terns contain list which has file name and size of the file. This file name may contain UTF-8 characters as shown below.

['test.rus (\xd0\xa5\xd0\xb5\xd0\xbb\xd1\x8c\xd1\x88\xd0\xb8).srt', 23930]
test.rus (Хельши).srt

Now while trying to insert that data into database I am getting error as below

Traceback (most recent call last):
  File "filedup.py", line 267, in <module>
    read_file_directory(directory)
  File "filedup.py", line 118, in read_file_directory
    (values[i][0], each, values[i][1]))
sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

The function which perform this operation is given below

from collections import defaultdict
dirDict = defaultdict(list)    
def read_file_directory(path):
    global dirDict
    logger.debug("Path being scanned %s" %path)
    fileStats = []
    for root, subFolders, files in os.walk(path):
        for file_name in files:
            fileStats = []
            fileStats.insert(0, file_name)
            fileSize = os.path.getsize(os.path.join(root,file_name))
            fileStats.insert(1, fileSize)
            dirDict[root].append(fileStats)
    #Insert the data in DB
    cursor = dbHandler.cursor()
    keys = dirDict.keys()
    for each in keys:
        values = dirDict[each]
        print values
        for i in xrange(len(values)):
            print values[i]
            print values[i][0]
            print values[i][1]
            fileName = values[i][0]
            fileSize = values[i][1]
            cursor.execute("insert or ignore into master \
                (FileName, FilePath, FileSize) values(?,?,?)", \
                (values[i][0], each, values[i][1]))
            logger.debug("Insert data for %s, %s, %s" %(values[i][0], each, values[i][1]))

Now as I am trying to learn Python I am not getting how to fix this issue. The Python version I am using is given below

$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2

So any thoughts how to fix with current version of Python as I am looking for generic fix so that it can be work even on higher versions. Also I have observed that due to this error none of the data is being inserted into the database. So how can I make sure that even if some operation result into an error the previous data can be inserted into the database.

like image 229
Abhinav Avatar asked Feb 11 '23 21:02

Abhinav


1 Answers

The sqlite exception recommends that you switch to unicode strings, so you should do that.

Python's directory listing functions such as os.walk has a curious property; they will return normal strings when given normal strings, and return unicode strings when given unicode strings. Therefore, when using os.walk(path) like in your code, you should make sure that path is a unicode string.

To do this, you can explicitly convert to unicode using the unicode() function, for example by writing path = unicode(path) before the call to os.walk.

Also, you need to call cursor.commit() in your code to actually write to the database. Calling it once after you finished looping through all the filenames should be sufficient.

like image 130
parchment Avatar answered Feb 14 '23 12:02

parchment