Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using C#'s XmlReader on slightly malformed XML

I'm trying to use C#'s XmlReader on a large series of XML files, they are all properly formatted except for a few select ones (unfortunately I'm not in a position to have them changed, because it would break a lot of other code).

The errors only come from one specific part of the these affronting XML files and it's ok to just skip them but I don't want to stop reading the rest of the XML file.

The bad parts look like this:

 <InterestingStuff>
  ...
    <ErrorsHere OptionA|Something = "false" OptionB|SomethingElse = "false"/>
    <OtherInterestingStuff>
    ...
    </OtherInterestingStuff>
</InterestingStuff>

So really if I could just ignore invalid tags, or ignore the pipe symbol then I would be ok.

Trying to use XmlReader.Skip() when I see the name "ErrorsHere" doesn't work, apparently it already reads a bit ahead and throws the exception.

TLDR: How do I skip so I can read in the XML file above, using the XmlReader?

Edit:

Some people suggested just replacing the '|'-symbol, but the idea of XmlReader is to not load the entire file but only traverse parts you want, since I'm reading directly from files I can not afford the read in entire files, replace all instances of '|' and then read parts again :).

like image 732
Roy T. Avatar asked Jul 11 '11 10:07

Roy T.


People also ask

What is using () in C#?

The using statement causes the object itself to go out of scope as soon as Dispose is called. Within the using block, the object is read-only and can't be modified or reassigned. A variable declared with a using declaration is read-only.

How do I start learning C?

Get started with C. Official C documentation - Might be hard to follow and understand for beginners. Visit official C Programming documentation. Write a lot of C programming code - The only way you can learn programming is by writing a lot of code.

Is C good for beginners?

It's not. C is a low-level language that provides few high-level abstractions. It has been described as a “portable assembler.” C is a dangerous language to use for the uninitiated.


1 Answers

I've experimented a bit with this in the past.

In general the input simply has to be well-formed. An XmlReader will go into an unrecoverable error-state when the basic XML rules are broken. It is easy to avoid schema-validation but that's not relevant here.

Your only option is to clean the input, that can be done in a streaming manner (custom Stream or TextReader) but that will require a light form of parsing. If you don't have pipe-symbols in valid positions it's easy.

like image 63
Henk Holterman Avatar answered Sep 19 '22 09:09

Henk Holterman