Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting OpenXmlElements between CommentRangeStart and CommentRangeEnd

What I am trying to do is find the OpenXMLElements between a CommentRangeStart and the corresponding CommentRangeEnd.

I have tried two methods to achieve this however the problem is a CommentRangeEnd does not need to be on the same level as the start. It can be nested in a child element see the below very simple structure (note this is not correct open xml it is just to show the general idea).

<w:commentstart/>
<w:paragraph>
  <w:run />
  <w:commentend />
</w:paragraph>

The two items I have tried are the following: First: I wrote an enumerable which returns items until the end

public static IEnumerable<OpenXmlElement> SiblingsUntilCommentRangeEnd(CommentRangeStart commentStart)
{
    OpenXmlElement element = commentStart.NextSibling();

    if (IsMatchingCommentEnd(element, commentStart.Id.Value))
    {
        yield break;
    }

    while (true)
    { 
        yield return element;
        element = element.NextSibling();

        // Check that the item 
        if (element == null)
        {
            yield break;
        }

        if (IsMatchingCommentEnd(element, commentStart.Id.Value))
        {
            yield break;
        }
    }
}

public static bool IsMatchingCommentEnd(OpenXmlElement element, string commentId)
{
    CommentRangeEnd commentEnd = element as CommentRangeEnd;
    if (commentEnd != null)
    {
        return commentEnd.Id == commentId;
    }
    return false;
}

Second: Then realising the issue with the start and end not being on the same level I continued to hunt around and I found Eric Whites answer for dealing with elements between bookmark elements I retro fitted that for my example but still the issue with the start and end not having the same parent (i.e on the same level) was an issue and I could not use that.

Is there a better way to be looking at this I am looking for a way to handle the elements as I am needing to work with the text that is being commented on.

Edit: Clarification of what I am trying to achieve: I am taking a document edited in word and for a comment in the document I am looking to get the text that has been commented on in between the start and end range for a specific comment id.

Edit 2: I have put up a working version of what I am currently thinking but my issue with it is it potentially being quite fragile with different user combinations from Word. This is also working with xml which is not really an issue but could have liked to change to the OpenXML SDK. Currently it is looking like I am going to need to parse an entire document getting the items that I need instead of working with 1 specific comment. https://github.com/mhbuck/DocumentCommentParser/

Main issue encountering: The CommentRangeStart and CommentRangeEnd can be in different nestings within the XML document. The root node is potentially the only similar ancestor element.

like image 948
Mike B Avatar asked Aug 29 '12 10:08

Mike B


1 Answers

You can try to use Descendants<T>() method to enumerate all the descendants of a node of a given type. So, your code can look similar to this (i've written it without using yeld to make it more readable ;)):

public static IEnumerable<OpenXmlElement> SiblingsUntilCommentRangeEnd(CommentRangeStart commentStart)
{
    List<OpenXmlElement> commentedNodes = new List<OpenXmlElement>();

    OpenXmlElement element = commentStart;

    while (true)
    {
        element = element.NextSibling();

        // check that the item exists
        if (element == null)
        {
            break;
        }

        //check that the item is matching comment end
        if (IsMatchingCommentEnd(element, commentStart.Id.Value))
        {
            break;
        }

        //check that there is a matching element in the current element's descendants
        var descendantsCommentEnd = element.Descendants<CommentRangeEnd>();
        if (descendantsCommentEnd != null)
        {
            foreach (CommentRangeEnd rangeEndNode in descendantsCommentEnd)
            {
                if (IsMatchingCommentEnd(rangeEndNode, commentStart.Id.Value))
                {
                    //matching range end element found in current element's descendants
                    //an improvement could be made here to manually select descendants before CommentRangeEnd node
                    break;
                }
            }
        }

        commentedNodes.Add(element);
    }

    return commentedNodes;
}

As marked in one of the comments, it's now ending if it finds CommentRangeEnd element in current element's descendants.

I haven't tested this code yet, so if you have any issues with it, let me know in the comments.

Note that this method won't work if start element is deeper in document's hierarchy than end element. In some cases, it also won't return some of the contents put in a comment. If you need it, I can later update the answer with an alternative solution to handle this case. Please also explain why do you need to find those comments, because maybe an alternative method can be used.

like image 189
Lukasz M Avatar answered Nov 03 '22 03:11

Lukasz M