Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# hexadecimal value 0x12, is an invalid character

I am loading a lot of xml documents and some of them return errors like "hexadecimal value 0x12, is an invalid character" and there are different character. How to remove them?

like image 878
Duke Avatar asked Jan 10 '14 19:01

Duke


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is the full name of C?

In the real sense it has no meaning or full form. It was developed by Dennis Ritchie and Ken Thompson at AT&T bell Lab. First, they used to call it as B language then later they made some improvement into it and renamed it as C and its superscript as C++ which was invented by Dr.

What is C in C language?

What is C? C is a general-purpose programming language created by Dennis Ritchie at the Bell Laboratories in 1972. It is a very popular language, despite being old. C is strongly associated with UNIX, as it was developed to write the UNIX operating system.

Is C language easy?

C is a general-purpose language that most programmers learn before moving on to more complex languages. From Unix and Windows to Tic Tac Toe and Photoshop, several of the most commonly used applications today have been built on C. It is easy to learn because: A simple syntax with only 32 keywords.


2 Answers

I made a small research here.

Here is the ASCII table. There are 128 symbols asciitable Here is some small test code which adds every symbol from ASCII table and tries to load it as an XML document.

static public void RegexTry() {     StreamReader stream = new StreamReader(@"test.xml");     string xmlfile = stream.ReadToEnd();     stream.Close();      string text = "";      for (int i = 0; i < 128; i++ )     {         char t = (char) i;          text = xmlfile.Replace('П', t);          XmlDocument xml = new XmlDocument();         try         {             xml.LoadXml(text);         }         catch (Exception ex)         {             Console.WriteLine("Char("+i.ToString() +"): " + t + " => error! " + ex.Message);             continue;         }          Console.WriteLine("Char(" + i.ToString() + "): " + t + " => fine!");     }      Console.ReadKey(); } 

As a result it returns:

Char(0): => error! '.', hexadecimal value 0x00, is an invalid character. Line 5, position 7. Char(1): => error! '', hexadecimal value 0x01, is an invalid character. Line 5, position 7. Char(2): => error! '', hexadecimal value 0x02, is an invalid character. Line 5, position 7. Char(3): => error! '', hexadecimal value 0x03, is an invalid character. Line 5, position 7. Char(4): => error! '', hexadecimal value 0x04, is an invalid character. Line 5, position 7. Char(5): => error! '', hexadecimal value 0x05, is an invalid character. Line 5, position 7. Char(6): => error! '', hexadecimal value 0x06, is an invalid character. Line 5, position 7. Char(7): => error! '', hexadecimal value 0x07, is an invalid character. Line 5, position 7. Char(8): => error! '', hexadecimal value 0x08, is an invalid character. Line 5, position 7. Char(9):     => fine! Char(10):   => fine! Char(11): => error! '', hexadecimal value 0x0B, is an invalid character. Line 5, position 7. Char(12): => error! '', hexadecimal value 0x0C, is an invalid character. Line 5, position 7. Char(13):   => fine! Char(14): => error! '', hexadecimal value 0x0E, is an invalid character. Line 5, position 7. Char(15): => error! '', hexadecimal value 0x0F, is an invalid character. Line 5, position 7. Char(16): => error! '', hexadecimal value 0x10, is an invalid character. Line 5, position 7. Char(17): => error! '', hexadecimal value 0x11, is an invalid character. Line 5, position 7. Char(18): => error! '', hexadecimal value 0x12, is an invalid character. Line 5, position 7. Char(19): => error! '', hexadecimal value 0x13, is an invalid character. Line 5, position 7. Char(20): => error! '', hexadecimal value 0x14, is an invalid character. Line 5, position 7. Char(21): => error! '', hexadecimal value 0x15, is an invalid character. Line 5, position 7. Char(22): => error! '', hexadecimal value 0x16, is an invalid character. Line 5, position 7. Char(23): => error! '', hexadecimal value 0x17, is an invalid character. Line 5, position 7. Char(24): => error! '', hexadecimal value 0x18, is an invalid character. Line 5, position 7. Char(25): => error! '', hexadecimal value 0x19, is an invalid character. Line 5, position 7. Char(26): => error! '', hexadecimal value 0x1A, is an invalid character. Line 5, position 7. Char(27): => error! '', hexadecimal value 0x1B, is an invalid character. Line 5, position 7. Char(28): => error! '', hexadecimal value 0x1C, is an invalid character. Line 5, position 7. Char(29): => error! '', hexadecimal value 0x1D, is an invalid character. Line 5, position 7. Char(30): => error! '', hexadecimal value 0x1E, is an invalid character. Line 5, position 7. Char(31): => error! '', hexadecimal value 0x1F, is an invalid character. Line 5, position 7. Char(32):   => fine! Char(33): ! => fine! Char(34): " => fine! Char(35): # => fine! Char(36): $ => fine! Char(37): % => fine! Char(38): => error! An error occurred while parsing EntityName. Line 5, position 8. Char(39): ' => fine! Char(40): ( => fine! Char(41): ) => fine! Char(42): * => fine! Char(43): + => fine! Char(44): , => fine! Char(45): - => fine! Char(46): . => fine! Char(47): / => fine! Char(48): 0 => fine! Char(49): 1 => fine! Char(50): 2 => fine! Char(51): 3 => fine! Char(52): 4 => fine! Char(53): 5 => fine! Char(54): 6 => fine! Char(55): 7 => fine! Char(56): 8 => fine! Char(57): 9 => fine! Char(58): : => fine! Char(59): ; => fine! Char(60): => error! The '<' character, hexadecimal value 0x3C, cannot be included in a name. Line 5, position 13. Char(61): = => fine! Char(62): > => fine! Char(63): ? => fine! Char(64): @ => fine! Char(65): A => fine! Char(66): B => fine! Char(67): C => fine! Char(68): D => fine! Char(69): E => fine! Char(70): F => fine! Char(71): G => fine! Char(72): H => fine! Char(73): I => fine! Char(74): J => fine! Char(75): K => fine! Char(76): L => fine! Char(77): M => fine! Char(78): N => fine! Char(79): O => fine! Char(80): P => fine! Char(81): Q => fine! Char(82): R => fine! Char(83): S => fine! Char(84): T => fine! Char(85): U => fine! Char(86): V => fine! Char(87): W => fine! Char(88): X => fine! Char(89): Y => fine! Char(90): Z => fine! Char(91): [ => fine! Char(92): \ => fine! Char(93): ] => fine! Char(94): ^ => fine! Char(95): _ => fine! Char(96): ` => fine! Char(97): a => fine! Char(98): b => fine! Char(99): c => fine! Char(100): d => fine! Char(101): e => fine! Char(102): f => fine! Char(103): g => fine! Char(104): h => fine! Char(105): i => fine! Char(106): j => fine! Char(107): k => fine! Char(108): l => fine! Char(109): m => fine! Char(110): n => fine! Char(111): o => fine! Char(112): p => fine! Char(113): q => fine! Char(114): r => fine! Char(115): s => fine! Char(116): t => fine! Char(117): u => fine! Char(118): v => fine! Char(119): w => fine! Char(120): x => fine! Char(121): y => fine! Char(122): z => fine! Char(123): { => fine! Char(124): | => fine! Char(125): } => fine! Char(126): ~ => fine! Char(127):  => fine!   

You can see there are a lot of symbols which can't be in XML code. To replace them we can use Reqex.Replace

static string ReplaceHexadecimalSymbols(string txt) {     string r = "[\x00-\x08\x0B\x0C\x0E-\x1F\x26]";     return Regex.Replace(txt, r,"",RegexOptions.Compiled); } 

PS. Sorry if everybody knew that.

like image 180
Duke Avatar answered Sep 20 '22 18:09

Duke


The XML specification defines the valid characters like this:

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] 

As you can see #x12 is not a valid character in an XML document.

You ask how to remove them but I think that is not the question you should be asking. They should simply not be present. You should reject any such document as mal-formed. Simply removing invalid characters suppresses the real problem.

If you are creating the documents in question then you need to fix the code that generates it so that it generates valid XML.

like image 39
David Heffernan Avatar answered Sep 20 '22 18:09

David Heffernan