Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use win32com to handle overflow when querying Desktop Search?

I'm querying the Windows Desktop Search JET (ESE) database using Python + ADO. It works but after ~7600 records I get an exception when advancing to the next record using MoveNext. I know it is not at EOF because I can run the same query in VBScript and get way more records with the same query.

Exception traceback

Traceback (most recent call last):
  File "test_desktop_search.py", line 60, in <module>
    record_set.MoveNext()
  File "<COMObject ADODB.Recordset>", line 2, in MoveNext
pywintypes.com_error: (-2147352567, 'Exception occurred.', (0, None, None, None, 0, -2147215865), None)

Querying that error shows that it is:

  • Constant: adErrDataOverflow
  • Value: 3721 -2146824567 0x800A0E89
  • Description: Data value is too large to be represented by the field data type.

This works fine in VBScript (but perhaps only due to poor error handling). PowerShell has the following error (after getting much further than Python, about to the same place as VBScript gets):

Exception from HRESULT: 0x80041607
At C:\Users\doday\PycharmProjects\desktop_search_test\Get-DesktopSearchData.ps1:43 char:5
+     $recordSet.MoveNext();
+     ~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationStopped: (:) [], COMException
    + FullyQualifiedErrorId : System.Runtime.InteropServices.COMException

I could not find this error code in Microsoft documentation, but it is likely related. As you can see, the Facility field is 4 (interface-specific HRESULT error) and the code is 1607.

MCVE

#!/usr/bin/env python
"""
Test querying Desktop Search from Python
"""

import csv
import pywintypes
from win32com.client import Dispatch

# connection
conn = Dispatch("ADODB.Connection")
connstr = "Provider=Search.CollatorDSO;Extended Properties='Application=Windows';"
conn.Open(connstr)

# record set
record_set = Dispatch("ADODB.Recordset")
q = "SELECT System.ItemName, System.ItemTypeText, System.Size, System.IsDeleted, System.DateAccessed, System.Kind, System.ItemDate, System.Search.Store, System.ItemParticipants, System.ItemAuthors, System.IsRead, System.Message.AttachmentNames FROM SystemIndex"
# record_set.ActiveConnection = conn
record_set.Open(q, conn)

# header
# I'm only selecting a few fields for this test, see
# https://msdn.microsoft.com/en-us/library/windows/desktop/bb419046(v=vs.85).aspx
header = [
    "System.ItemName",
    "System.ItemTypeText",
    "System.Size",
    "System.IsDeleted",
    "System.DateAccessed",
    "System.Kind",
    "System.ItemDate",
    "System.Search.Store",
    "System.ItemParticipants",
    "System.ItemAuthors",
    "System.IsRead",
    "System.Message.AttachmentNames"
]

# output to file
with open("ds_output.tsv", "w", newline='') as out_f:
    w = csv.DictWriter(out_f, fieldnames=header, delimiter='\t')
    w.writeheader()
    record_set.MoveFirst()
    while not record_set.EOF:
        record = dict.fromkeys(header)

        # populate fields
        for h in header:
            record[h] = record_set.Fields.Item(h).Value

        # write record
        w.writerow(record)

        try:
            record_set.MoveNext()
        except pywintypes.com_error as e:
            # can't figure out how to resolve this or at least advance to next record despite this error
            print("Error: {}".format(e.args))

# clean up
record_set.Close()
record_set = None
conn.Close()
conn = None

What I've tried so far

  • I tried removing the "System.Message.AttachmentNames" column/field from my query but that actually made it fail faster / after fewer records with what appears to be the same error (the first number in the exception args is the same).
  • I tried only using one field ("System.ItemName"), which made it twice as far as other attempts in Python but ended up failing on a UnicodeEncodeError (which does not appear to be related to the other error shown above, which prevents it from ever making it to the file name with that error).
  • I tried using PowerShell and also received a COMException (error output shown above).
like image 659
Arthur Dent Avatar asked Jun 09 '18 01:06

Arthur Dent


1 Answers

Your trackback shows a -2147352567 error code. This is DISP_E_EXCEPTION, a very generic COM exception. It wraps a -2147215865 error, which is also 0x80041607, which is also QUERY_E_TIMEDOUT.

I use this free website tool - I made it - to find errors and other constants like that: https://www.magnumdb.com/search?q=-2147215865

So, in fact, they all report the same timeout error.

You can increase ADO timeout as described here: Having Problems With SystemIndex (I Love It But Can't Make It Work How I Want)

or you can remove it completely, with a code like this:

conn.CommandTimeout = 0
like image 134
Simon Mourier Avatar answered Nov 01 '22 03:11

Simon Mourier