Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I compare Word Interop objects for "reference equality" AND determine collection or parent object to which, say, a paragraph belongs?

I would like to be able to:

  1. compare Word Interop COM proxies on a "reference equality" basis; and
  2. map from a specific object (say a paragraph) to the collection it comes from, OR at least
  3. determine whether two paragraphs are from the same section and which one comes relatively before the previous one

Why do I want to do this? I am trying to build a Word Add-In that acts similarly to a spell-checker in the sense that it runs in the background (by background I mean by regularly stealing time from the main Word thread using SendMessage) and scans the document for certain text "tokens". I want to be able to keep a collection of the tokens around and update them as the document changes. A specific example of this is if the user edits a given paragraph, I want to rescan the paragraph and update my data structure which points to that paragraph. If there is no way to map between the paragraph the user edited in (i.e. the paragraph where the start of the selection range is) and a paragraph that I have "stored" in a data structure, I can't do this.


Example Code for item #1, above

If I write the following VBA code:

Dim Para1 As Paragraph
Dim Para2a As Paragraph
Dim Para2b As Paragraph
Set Para1 = ActiveDocument.Paragraphs(1)
Set Para2a = Para1.Next
Set Para2b = Para1.Next.Next.Previous
If Para2a Is Para2b Then
    Debug.Print ("Para2a Is Para2b")
Else
    Debug.Print ("Para2a Is Not Para2b")
End If

Then I am getting the output:

"Para2a Is Not Para2b"

Which is perhaps physically true (different COM proxies) but not logically true. I need to be able to compare those paragraphs and determine if they are logically the same underlying paragraph.

(I am planning to write the add-in in C#, but the above VBA code demonstrates the kind of problem I need to overcome before doing too much coding).

For items 2 and 3 above, hopefully they will be self-explanatory. Say I have a paragraph (interop proxy) reference. I want to figure out "where" it is in the document. Does it belong to Section 1? Is it in a footer? Without this ability, all I can reasonably do to obtain an idea of where things come from is rescan the entire document every time it changes, which is of course absurdly inefficient and won't be timely enough for the app user.

Any thoughts greatly appreciated! I'm happy to post additional information as needed.

like image 407
user1103975 Avatar asked Dec 17 '11 23:12

user1103975


1 Answers

Navigating the particulars of reference equality in the context of COM Interop is always an interesting exercise.

I wouldn't be privy to the implementation details of the Paragraph.Next() and Paragraph.Previous() methods, however the behavior they exhibit is very similar to how COM-based collections act in general in regards to Runtime Callable Wrapper creation.

Typically, if possible, the framework avoids creating new RCW instances in response to additional references being made to COM objects that already have an RCW initialized and assigned. If an RCW already exists for a particular pointer to IUnknown, an internal reference count maintained by that RCW is incremented, and then the RCW is returned. This allows the framework to avoid incrementing the actual COM object's reference count (AddRef).

COM-based collections, which are COM objects that have managed representations implementing IEnumerable, seem to generate a new RCW each time an item is accessed, even if that item has already been accessed during the session.

For example:

Word.Document document = Application.ActiveDocument;
Paragraphs paragraphs = document.Paragraphs;

Paragraph first = paragraphs[1];
Paragraph second = paragraphs[1];

bool thisIsFalse = (first == second);

If you want to do any sort of "reference equality" checking, you need to escape from the COM based collection, specifically in your case: the Paragraphs object. You can do this simply by grabbing its kids and storing them in your own, purely managed and predictable collection, like so:

List<Paragraph> niceParagraphs = paragraphs.Cast<Paragraph>().ToList();

Although using LINQ with COM Interop may look a bit scary (if it doesn't to you...it really should!) I'm fairly certain the above code is safe and will not leave any dangling references out there, or anything else nasty. I have not tested the above code exhaustively, however.

Don't forget to properly release those resources when you are done with them, at least if your requirements require that level of prudence.

like image 98
Matt Weber Avatar answered Sep 17 '22 12:09

Matt Weber