Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you access the numbering of an outline in a Word Document using C# and OpenXml?

I am trying to transfer an outline in Microsoft Word 2010 to a spreadsheet in Microsoft Excel 2010. I'm using DocumentFormat.OpenXml.Packing and Documentformat.OpenXml.Wordprocessing

I get the body of the document, and use it to get a list of all the paragraph objects:

var allParagraphs = new List<Paragraph>();
WordprocessingDocument wordprocessingDocument = WordprocessingDocument.Open(wordDocPath.Text, false);
Body body = wordprocessingDocument.MainDocumentPart.Document.Body;
allParagraphs = body.OfType<Paragraph>().ToList();

But I can't seem to find anything that stores the outline numbering that is next to the paragraph. Do I need to be grabbing other objects besides the paragraphs in the document to get the outline numbers, if there are any, for every paragraph?

The outline numbering I'm speaking of appears to the left of these headers in the screenshot below:

outline numbering of an outline

Unfortunately, the ParagraphProperties.OutlineLevel is null, even though I know it is part of an outline in the word document.

like image 516
Jake Smith Avatar asked Mar 22 '23 05:03

Jake Smith


1 Answers

Now that I've understood what you want exactly, here's how you should go about solving your problem.

First of all I'd recommend you download the Open XML Productivity tool from here. Once you know what underlying xml looks like for a file, it becomes really easy to tackle the problem.

<w:p w:rsidR="004265BF" w:rsidP="00AD13B6" w:rsidRDefault="00AD13B6">
 <w:pPr>
  <w:pStyle w:val="ListParagraph" />
  <w:numPr>
   <w:ilvl w:val="0" />
   <w:numId w:val="2" />
  </w:numPr>
 </w:pPr>
 <w:r>
  <w:t>Requirements</w:t>
 </w:r>
</w:p>
<w:p w:rsidR="00AD13B6" w:rsidP="00AD13B6" w:rsidRDefault="00AD13B6">
 <w:pPr>
  <w:pStyle w:val="ListParagraph" />
   <w:numPr>
    <w:ilvl w:val="1" />
    <w:numId w:val="2" />
   </w:numPr>
  </w:pPr>
  <w:r>
   <w:t>Performance</w:t>
  </w:r>
</w:p>

Above you can see the XML for the just a few paragraphs. Each Paragraph has its corresponding <w:numPr> and that contains <w:numId>.

Each word document contains many different XML files that act as references to the styles and values that are used throughout the document body. For outlines, there's Numbering.xml.

Each numId here refers to an AbstractNumId in numbering.xml and that in turn refers to abstractNum in the same file. You can get your Outline number from there.

I know it might sound tedious, but this is the only way it can be done.

Open Xml Productivity Tool Snapshot.

All the best!

using (WordprocessingDocument doc = WordprocessingDocument.Open("word-wrP.docx", true))
        {
            Body body = doc.MainDocumentPart.Document.Body;

            //Documents' numbering definition
            Numbering num = doc.MainDocumentPart.NumberingDefinitionsPart.Numbering;

            //Get all paragraphs in the document
            IEnumerable<Paragraph> paragraphs = doc.MainDocumentPart.Document.Body.OfType<Paragraph>();
            foreach (Paragraph paragraph in paragraphs)
            {
                int tempLevel = 0; 
             
                //Each paragraph has a reference to a numbering definition that is defined by the numbering ID
                NumberingId numId = paragraph.ParagraphProperties.NumberingProperties.NumberingId;

                //NumberingLevelReference defines the outline level or the "indent" of Numbering, index starts at Zero.
                NumberingLevelReference iLevel =
                    paragraph.ParagraphProperties.NumberingProperties.NumberingLevelReference;

                //From the numbering reference we get the actual numbering definition to get start value of the outline etc etc.
                var firstOrDefault =
                    num.Descendants<NumberingInstance>().FirstOrDefault(tag => tag.NumberID == (int)numId.Val);
                if (firstOrDefault != null)
                {
                    var absNumId =
                        firstOrDefault.GetFirstChild<AbstractNumId>();
                    AbstractNum absNum =
                        num.OfType<AbstractNum>().FirstOrDefault(tag => tag.AbstractNumberId == (int)absNumId.Val);
                    if (absNum != null)
                    {
                        StartNumberingValue start = absNum.OfType<StartNumberingValue>().FirstOrDefault();
                        // once you have the start value its just a matter of counting the paragraphs that have the same numberingId and from the Number
                        //ingLevel you can calculate the actual values that correspond to each paragraph.
                        if (start != null) startValue = start.Val;
                    }
                }
                else
                {
                    Console.WriteLine("Failed!");
                }
            }
        }
like image 155
Varun Rathore Avatar answered Apr 06 '23 18:04

Varun Rathore