Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the page number from a paragraph using OpenXML?

For a Paragraph object, how can I determine on which page this is located using the Open XML SDK 2.0 for Microsoft Office ?

like image 330
Stef Heyenrath Avatar asked Feb 18 '13 12:02

Stef Heyenrath


2 Answers

It is not possible to get page numbers for a word document using OpanXml Sdk as this is handled by the client (like MS Word).

However if the document you are working with is previously opened by a word client and saved back, then the client will add LastRenderedPageBreak to identify the page breaks. Refer to my answer here for more info about LastRenderedPageBreaks. This enables you to count for the number of LastRenderedPageBreak elements before your paragraph to get the current page count.

If this is not the case then the noddy option to work around your requirement is to add footers with page numbers (may be with same colour as your documents to virtually hide it!). Only an option - if you are automating the word document generation using OpenXML sdk.

like image 78
Flowerking Avatar answered Nov 10 '22 15:11

Flowerking


@Flowerking : thanks for the information.

Because I need to loop all the paragraphs anyway to search for a certain string, I can use the following code to find the page number:

using (var document = WordprocessingDocument.Open(@"c:\test.docx", false))
{
    var paragraphInfos = new List<ParagraphInfo>();

    var paragraphs = document.MainDocumentPart.Document.Descendants<Paragraph>();

    int pageIdx = 1;
    foreach (var paragraph in paragraphs)
    {
        var run = paragraph.GetFirstChild<Run>();

        if (run != null)
        {
            var lastRenderedPageBreak = run.GetFirstChild<LastRenderedPageBreak>();
            var pageBreak = run.GetFirstChild<Break>();
            if (lastRenderedPageBreak != null || pageBreak != null)
            {
                pageIdx++;
            }
        }

        var info = new ParagraphInfo
        {
            Paragraph = paragraph,
            PageNumber = pageIdx
        };

        paragraphInfos.Add(info);
    }

    foreach (var info in paragraphInfos)
    {
        Console.WriteLine("Page {0}/{1} : '{2}'", info.PageNumber, pageIdx, info.Paragraph.InnerText);
    }
}
like image 2
Stef Heyenrath Avatar answered Nov 10 '22 14:11

Stef Heyenrath