Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Powerpoint OpenXML whitespace is disappearing

I'm coming across a problem where whitespace is being removed in powerpoint documents as soon as I reference a slide. The following code sample illustrates what I mean-

//Open the document.
using(PresentationDocument presentationDocument = PresentationDocument.Open(pptxFileName, true))
{
 //Just making this reference modifies the whitespace in the slide.
 Slide slide = presentationDocument.PresentationPart.SlideParts.First().Slide;
}

To reproduce this issue, create a presentation with a single slide, containing a single text box with the text "[ ]" (no quotes) in it. Now, set the font of the space between the square brackets to a different color than the rest of the text. This will result in a Run containing only whitespace characters. Once the code above is run against this presentation, the line that references the slide will cause the whitespace in the Run to disappear, ultimately leaving a us with a visually changed presentation than we originally started with, even though we never explicitly changed anything- the text will now be "[]" when opened in the powerpoint application.

In Word, the xml:space attribute can be set to 'preserve' on text elements to preserve whitespace, but it appears that there is no equivalent for Powerpoint.

This is a critical problem in situations where whitespace is used as a key component of slide design. Has anybody figured out a workaround for this issue?

like image 677
ptrc Avatar asked Jan 19 '23 22:01

ptrc


1 Answers

Yes, you have found a bug in the SDK.

@Chris, first of all, that code is, per the semantics of the Open XML SDK, modifying the file. When you access the contents of the part, and then go out of scope of the using statement, the contents of the part are written back into the package. This is because the presentation was opened for read/write (the second argument of the call to the Open method).

The problem is that when the contents of the part are read from the package, the space is being stripped off.

        //Open the document. 
    using (PresentationDocument presentationDocument = PresentationDocument.Open("test.pptx", true))
    {
        //Just making this reference modifies the whitespace in the slide. 
        Slide slide = presentationDocument.PresentationPart.SlideParts.First().Slide;
        var sh = slide.CommonSlideData.ShapeTree.Elements<DocumentFormat.OpenXml.Presentation.Shape>().First();
        Run r = sh.TextBody.Elements<Paragraph>().First().Elements<Run>().Skip(1).FirstOrDefault();
        Console.WriteLine(">{0}<", r.Text.Text);
        //r.Text.Text = " ";
    } 

If you run the above code on the presentation, you can see that by the time you access that text element, the text of the text element is already incorrect.

If you uncomment the line that sets the text, interestingly, the slide does contain the space.

This is obviously a bug. I have reported it to the program manager at Microsoft who is responsible for the Open XML SDK.

As this scenario is important to you, I recommend that you use LINQ to XML for your code. The following code works fine:

    using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Presentation;
using DocumentFormat.OpenXml.Drawing;

public static class PtOpenXmlExtensions
{
    public static XDocument GetXDocument(this OpenXmlPart part)
    {

        XDocument partXDocument = part.Annotation<XDocument>();
        if (partXDocument != null)
            return partXDocument;
        using (Stream partStream = part.GetStream())
        using (XmlReader partXmlReader = XmlReader.Create(partStream))
            partXDocument = XDocument.Load(partXmlReader);
        part.AddAnnotation(partXDocument);
        return partXDocument;
    }

    public static void PutXDocument(this OpenXmlPart part)
    {
        XDocument partXDocument = part.GetXDocument();
        if (partXDocument != null)
        {
            using (Stream partStream = part.GetStream(FileMode.Create, FileAccess.Write))
            using (XmlWriter partXmlWriter = XmlWriter.Create(partStream))
                partXDocument.Save(partXmlWriter);
        }
    }
}

class Program
{
    static void Main(string[] args)
    {
        using (PresentationDocument presentationDocument = PresentationDocument.Open("test.pptx", true))
        {
            XDocument slideXDoc = presentationDocument.PresentationPart.SlideParts.First().GetXDocument();
            XNamespace p = "http://schemas.openxmlformats.org/presentationml/2006/main";
            XNamespace a = "http://schemas.openxmlformats.org/drawingml/2006/main";
            XElement sh = slideXDoc.Root.Element(p + "cSld").Element(p + "spTree").Elements(p + "sp").First();
            XElement r = sh.Element(p + "txBody").Elements(a + "p").Elements(a + "r").Skip(1).FirstOrDefault();
            Console.WriteLine(">{0}<", r.Element(a + "t").Value);
        } 
    }
}

You could, in theory, write some generic code to dig through the LINQ to XML tree, find all elements that contain only significant white space, then traverse the Open XML SDK element tree, and set the text of those elements. That is a bit of a mess, but once done, you could use the strongly typed OM of the Open XML SDK 2.0. The values of such elements would then be correct.

One technique that makes it more easy to use LINQ to XML with Open XML is to preatomize XName objects. See http://blogs.msdn.com/b/ericwhite/archive/2008/12/15/a-more-robust-approach-for-handling-xname-objects-in-linq-to-xml.aspx

-Eric

like image 93
Eric White Avatar answered Jan 31 '23 13:01

Eric White