I am using a REXML Ruby parser to parse an XML file. But on a 64 bit AIX box with 64 bit Ruby, I am getting the following error:
REXML::ParseException: #<REXML::ParseException: #<RegexpError: Stack overflow in
regexp matcher:
/^<((?>(?:[\w:][\-\w\d.]*:)?[\w:][\-\w\d.]*))\s*((?>\s+(?:[\w:][\-\w\d.]*:)?[\w:][\-\w\d.]*\s*=\s*(["']).*?\3)*)\s*(\/)?>/mu>
The call for the same is something like this:
REXML::Document.new(File.open(actual_file_name, "r"))
Does anyone have an idea regarding how to solve this issue?
I've had several issues for REXML, it doesn't seem to be the most mature library. Usually I use Nokogiri for Ruby XML parsing stuff, it should be faster and more stable than REXML. After installing it with sudo gem install nokogiri
, you can use something like this to get a DOM instance:
doc = Nokogiri.XML(File.open(actual_file_name, 'rb'))
# => #<Nokogiri::XML::Document:0xf1de34 name="document" [...] >
The documentation on the official webpage is also much better than that of REXML, IMHO.
I almost immediately found the answer.
The first thing I did was to search in the ruby source code for the error being thrown. I found that regex.h was responsible for this.
In regex.h, the code flow is something like this:
/* Maximum number of duplicates an interval can allow. */
#ifndef RE_DUP_MAX
#define RE_DUP_MAX ((1 << 15) - 1)
#endif
Now the problem here is RE_DUP_MAX. On AIX box, the same constant has been defined somewhere in /usr/include. I searched for it and found in
/usr/include/NLregexp.h
/usr/include/sys/limits.h
/usr/include/unistd.h
I am not sure which of the three is being used(most probably NLregexp.h). In these headers, the value of RE_DUP_MAX has been set to 255! So there is a cap placed on the number of repetitions of a regex!
In short, the reason is the compilation taking the system defined value than that we define in regex.h!
This also answers my question which i had asked recently: Regex limit in ruby 64 bit aix compilation
I was not able to answer it immediately as i need to have min of 100 reputation :D :D Cheers!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With