I am writing an script which scans a directory recursively and store them in a dictionary which is a collection of list. This list in terns contain list which has file name and size of the file. This file name may contain UTF-8 characters as shown below.
['test.rus (\xd0\xa5\xd0\xb5\xd0\xbb\xd1\x8c\xd1\x88\xd0\xb8).srt', 23930]
test.rus (Хельши).srt
Now while trying to insert that data into database I am getting error as below
Traceback (most recent call last):
File "filedup.py", line 267, in <module>
read_file_directory(directory)
File "filedup.py", line 118, in read_file_directory
(values[i][0], each, values[i][1]))
sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.
The function which perform this operation is given below
from collections import defaultdict
dirDict = defaultdict(list)
def read_file_directory(path):
global dirDict
logger.debug("Path being scanned %s" %path)
fileStats = []
for root, subFolders, files in os.walk(path):
for file_name in files:
fileStats = []
fileStats.insert(0, file_name)
fileSize = os.path.getsize(os.path.join(root,file_name))
fileStats.insert(1, fileSize)
dirDict[root].append(fileStats)
#Insert the data in DB
cursor = dbHandler.cursor()
keys = dirDict.keys()
for each in keys:
values = dirDict[each]
print values
for i in xrange(len(values)):
print values[i]
print values[i][0]
print values[i][1]
fileName = values[i][0]
fileSize = values[i][1]
cursor.execute("insert or ignore into master \
(FileName, FilePath, FileSize) values(?,?,?)", \
(values[i][0], each, values[i][1]))
logger.debug("Insert data for %s, %s, %s" %(values[i][0], each, values[i][1]))
Now as I am trying to learn Python I am not getting how to fix this issue. The Python version I am using is given below
$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
So any thoughts how to fix with current version of Python as I am looking for generic fix so that it can be work even on higher versions. Also I have observed that due to this error none of the data is being inserted into the database. So how can I make sure that even if some operation result into an error the previous data can be inserted into the database.
The sqlite
exception recommends that you switch to unicode strings, so you should do that.
Python's directory listing functions such as os.walk
has a curious property; they will return normal strings when given normal strings, and return unicode strings when given unicode strings. Therefore, when using os.walk(path)
like in your code, you should make sure that path
is a unicode string.
To do this, you can explicitly convert to unicode using the unicode()
function, for example by writing path = unicode(path)
before the call to os.walk
.
Also, you need to call cursor.commit()
in your code to actually write to the database. Calling it once after you finished looping through all the filenames should be sufficient.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With