Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When is it required to escape characters in XML? [duplicate]

Tags:

soap

xml

escaping

When should we replace < > & " ' in XML to characters like &lt etc.

My understanding is that it's just to make sure that if the content part of XML has > < the parser will not treat is start or end of a tag.

Also, if I have a XML like:

<hello>mor>ning<hello>

should this be replaced to either:

  • &lthello&gtmor&gtning&lthello&gt
  • &lthello&gtmor>ning&lthello&gt
  • <hello>mor&gtning<hello>

I don't understand why replacing is needed. When exactly is it required and what exactly (tags or text) should be replaced?

like image 496
Kozlov Avatar asked Aug 01 '11 12:08

Kozlov


People also ask

Why do you need to use escape characters in strings?

It is used in character strings to indicate that the current line of source code continues on the next line. The value of an escape sequence represents the member of the character set used at run time.

How do you escape special characters in soap request?

If they are needed elsewhere, they must be escaped using either numeric character references or the strings " &amp; " and " &lt; " respectively.


1 Answers

<, >, &, " and ' all have special meanings in XML (such as "start of entity" or "attribute value delimiter").

In order to have those characters appear as data (instead of for their special meaning) they can be represented by entities (&lt; for < and so on).

Sometimes those special meanings are context sensitive (e.g. " doesn't mean "attribute delimiter" outside of a tag) and there are places where they can appear raw as data. Rather then worry about those exceptions, it is simplest to just always represent them as entities if you want to avoid their special meaning. Then the only gotcha is explicit CDATA sections where the special meaning doesn't hold (and & won't start an entity).

should this be replaced to either

It shouldn't be represented as any of those. Entities must be terminated with a semi-colon.

How you should represent it depends on which bit of your example of data and which is markup. You haven't said, for example, if <hello> is supposed to be data or the start tag for a hello element.

like image 114
Quentin Avatar answered Oct 19 '22 13:10

Quentin