Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the proper way to store a file name in XML?

I'm using XDocument to cache a list of files.

<file id="20" size="244318208">a file with an &amp;ersand.txt</file>

In this example, I used XText, and let it automatically escape characters in the file name, such as the & with &amp;

<file id="20" size="244318208"><![CDATA[a file with an &ersand.txt]]></file>

In this one, I used XCData to let me use a literal string rather than an escaped one, so it appears in the XML as it would in my application.

I'm wondering if either of them is better than the other under any certain conditions, or if it is just personal taste. Also, if it means anything, the file names may or may not contain illegal characters.

like image 665
nobody Avatar asked Jun 13 '12 20:06

nobody


3 Answers

I wouldn't explicitly use either XText or XCData - I'd just provide a string and let LINQ to XML do whatever it wants.

I do think the non-CDATA version is generally clearer though. Yes, amperands are escaped - and < will be too - but that's still considerably less fluff than the CDATA start/end section.

Don't forget that it should be pretty rare for humans to see the XML representation itself - the idea is that it's a transport for information which is reasonably readable in that representation when you need to. I wouldn't get too hung up about it.

like image 80
Jon Skeet Avatar answered Nov 18 '22 00:11

Jon Skeet


Both are essentially the same and there is no specific "best practice".

Personally, I reserve <![CDATA[]]> for large amounts of text that requires lots of escaping (say bits of code or HTML markup).

In this specific case, I would rather escape the & to &amp; as in your first example.

like image 24
Oded Avatar answered Nov 18 '22 02:11

Oded


Most file names will not contain ampersands, or less then symbols. So go with XText. Reserve XCData for cases where you expect a lot of those characters, such as when embedding and HTML fragment in an XML document.

Rationale: difference in CPU utilization to serialize and parse text are completely negligible. But there is a (small) difference in storage, bandwidth or memory needs. Everything else being equal, use the format that uses the least space (even if the differences are small).

like image 1
Kris Vandermotten Avatar answered Nov 18 '22 02:11

Kris Vandermotten