Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distinguishing Table of Contents in Word document

Does anyone know how when programmatically iterating through a word document, you can tell if a paragraph forms part of a table of contents (or indeed, anything else that forms part of a field).

My reason for asking is that I have a VB program that is supposed to extract the first couple of paragraphs of substantive text from a document - it's doing so by iterating through the Word.Paragraphs collection. I don't want the results to include tables of contents or other fields, I only want stuff that a human being would recognize as a header, title or a normal text paragraph. However it turns out that if there's a table of contents, then not only the table of contents itself but EVERY line in the table of contents appears as a separate item in Word.Paragraphs. I don't want these but haven't been able to find any property on the Paragraph object that would allow me to distinguish and so ignore them (I'm guessing I need the solution to apply to other field types too, like table of figures and table of authorities, which I haven't yet actually encountered but I guess potentially would cause the same problem)

like image 299
LondonPhantom Avatar asked Jul 08 '11 09:07

LondonPhantom


People also ask

How do I get Word to recognize table of contents?

Click References > Table of Contents and then choose an Automatic Table of Contents style from the list. Note: If you use a Manual Table of Contents style, Word won't use your headings to create a table of contents and won't be able to update it automatically.

How do I match table of contents with headings in Word?

After you insert the Table of Contents in the document it is static only until you update it. You can trigger the update by clicking on the TOC and then pressing <F9>. Pick the "Update entrire table" option to include new/changed headings.

What are the distinguish features of MS Word?

Some of the important features of Microsoft Word are: Home; options like alignment, font colour, font style, font size, bullets, line spacing, etc are available here. Insert; tables, graphs, images, shapes, charts, header, footer, page number, etc can be inserted. Design; various templates or the design can be used.

Why my table of contents is messed up in Word?

This sometimes happens when following on from previous paragraphs if styles have not been set correctly. Fix 1: Select the paragraph and apply the appropriate style that is not set to be selected for TOC. Check previous paragraphs if some are correct use format painter to copy the correct style to another.


2 Answers

Because of the limitations in the Word object model I think the best way to achieve this would be to temporarily remove the TOC field code, iterate through the Word document, and then re-insert the TOC. In VBA, it would look like this:

Dim doc As Document
Dim fld As Field
Dim rng As Range

Set doc = ActiveDocument

For Each fld In doc.Fields
    If fld.Type = wdFieldTOC Then
        fld.Select
        Selection.Collapse
        Set rng = Selection.Range 'capture place to re-insert TOC later
        fld.Cut
    End If
Next

Iterate through the code to extract paragraphs and then

Selection.Range = rng
Selection.Paste

If you are coding in .NET this should translate pretty closely. Also, this should work for Word 2003 and earlier as is, but for Word 2007/2010 the TOC, depending on how it is created, sometimes has a Content Control-like region surrounding it that may require you to write additional detect and remove code.

like image 76
joeschwa Avatar answered Sep 20 '22 05:09

joeschwa


This is not guaranteed, but if the standard Word styles are being used for the TOC (highly likely), and if no one has added their own style prefixed with "TOC", then it is OK. This is a crude approach, but workable.

Dim parCurrentParagraph As Paragraph

If Left(parCurrentParagraph.Format.Style.NameLocal, 3) = "TOC" Then

       '    Do something 

End If
like image 35
Gary Lee Avatar answered Sep 24 '22 05:09

Gary Lee