Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Illegal character '&' in raw string REXML parsing

Hi am trying to parse an XML file using REXML .... when there is an illegal character in my XML file ...its jus fails at this point.

So is there any way we could replace or remove these kind of characters ?

fails to parse with the error Illegal character '&' in raw string REXML parsing

<head> Negative test for underlying BJSPRICEENG N4&N5
</head>


doc = REXML::Document.new(File.open(file_name,"r:iso-8859-1:utf-8"))

testfile.elements["head"].text





doc = REXML::Document.new(content)
dir_path = doc.elements["TestBed/TestDir"].attributes["path"].to_s
    doc.elements.each("TestBed/TestDir") do |directory|
      directory.elements.each("file") do |testfile|

t= testfile.elements["head"].text

end
end
end




<file name="toptstocksensbybjs.m">
      <MCheck></MCheck>
      <TestExtension></TestExtension>
      <TestType></TestType>


<fcn name="lvlTwoDocExample" linenumber="20">
 <head> P1><&
</head>

 </fcn>

   </file>
like image 739
Vinay Avatar asked Jan 31 '26 21:01

Vinay


1 Answers

For your case, to remove the illegal & characters, you may try:

content = File.open(file_name,"r:iso-8859-1:utf-8").read
content.gsub!(/&(?!(?:amp|lt|gt|quot|apos);)/, '&amp;')
doc = REXML::Document.new(content)

However, for those other illegal characters, especially those unpaired <, >, ' or ", it will be much more difficult.

like image 99
Arie Xiao Avatar answered Feb 03 '26 13:02

Arie Xiao