I am attempting to parse, not evaluate, rails ERB files in a Hpricot/Nokogiri type manner. The files I am attempting to parse contain HTML fragments intermixed with dynamic content generated using ERB (standard rails view files) I am looking for a library that will not only parse the surrounding content, much the way that Hpricot or Nokogiri will but will also treat the ERB symbols, <%, <%= etc, as though they were html/xml tags.
Ideally I would get back a DOM like structure where the <%, <%= etc symbols would be included as their own node types.
I know that it is possible to hack something together using regular expressions but I was looking for something a bit more reliable as I am developing a tool that I need to run on a very large view code base where both the html content and the erb content are important.
For example, content such as:
blah blah blah <div>My Great Text <%= my_dynamic_expression %></div>
Would return a tree structure like:
root - text_node (blah blah blah) - element (div) - text_node (My Great Text ) - erb_node (<%=)
Rubocop. Runs RuboCop on all ruby statements found in ERB templates.
ERB (Embedded RuBy) is a feature of Ruby that enables you to conveniently generate any kind of text, in any quantity, from templates. The templates themselves combine plain text with Ruby code for variable substitution and flow control, which makes them easy to write and maintain.
I eventually ended up solving this problem by using RLex, http://raa.ruby-lang.org/project/ruby-lex/, the ruby version of lex with the following grammer:
%{ #define NUM 257 #define OPTOK 258 #define IDENT 259 #define OPETOK 260 #define CLSTOK 261 #define CLTOK 262 #define FLOAT 263 #define FIXNUM 264 #define WORD 265 #define STRING_DOUBLE_QUOTE 266 #define STRING_SINGLE_QUOTE 267 #define TAG_START 268 #define TAG_END 269 #define TAG_SELF_CONTAINED 270 #define ERB_BLOCK_START 271 #define ERB_BLOCK_END 272 #define ERB_STRING_START 273 #define ERB_STRING_END 274 #define TAG_NO_TEXT_START 275 #define TAG_NO_TEXT_END 276 #define WHITE_SPACE 277 %} digit [0-9] blank [ ] letter [A-Za-z] name1 [A-Za-z_] name2 [A-Za-z_0-9] valid_tag_character [A-Za-z0-9"'=@_():/ ] ignore_tags style|script %% {blank}+"\n" { return [ WHITE_SPACE, yytext ] } "\n"{blank}+ { return [ WHITE_SPACE, yytext ] } {blank}+"\n"{blank}+ { return [ WHITE_SPACE, yytext ] } "\r" { return [ WHITE_SPACE, yytext ] } "\n" { return[ yytext[0], yytext[0..0] ] }; "\t" { return[ yytext[0], yytext[0..0] ] }; ^{blank}+ { return [ WHITE_SPACE, yytext ] } {blank}+$ { return [ WHITE_SPACE, yytext ] }; "" { return [ TAG_NO_TEXT_START, yytext ] } "" { return [ TAG_NO_TEXT_END, yytext ] } "" { return [ TAG_SELF_CONTAINED, yytext ] } "" { return [ TAG_SELF_CONTAINED, yytext ] } "" { return [ TAG_START, yytext ] } "" { return [ TAG_END, yytext ] } "" { return [ ERB_BLOCK_END, yytext ] } "" { return [ ERB_STRING_END, yytext ] } {letter}+ { return [ WORD, yytext ] } \".*\" { return [ STRING_DOUBLE_QUOTE, yytext ] } '.*' { return [ STRING_SINGLE_QUOTE, yytext ] } . { return [ yytext[0], yytext[0..0] ] } %%
This is not a complete grammer but for my purposes, locating and re-emitting text, it worked. I combined that grammer with this small piece of code:
text_handler = MakeYourOwnCallbackHandler.new l = Erblex.new l.yyin = File.open(file_name, "r") loop do a,v = l.yylex break if a == 0 if( a < WORD ) text_handler.character( v.to_s, a ) else case a when WORD text_handler.text( v.to_s ) when TAG_START text_handler.start_tag( v.to_s ) when TAG_END text_handler.end_tag( v.to_s ) when WHITESPACE text_handler.white_space( v.to_s ) when ERB_BLOCK_START text_handler.erb_block_start( v.to_s ) when ERB_BLOCK_END text_handler.erb_block_end( v.to_s ) when ERB_STRING_START text_handler.erb_string_start( v.to_s ) when ERB_STRING_END self.text_handler.erb_string_end( v.to_s ) when TAG_NO_TEXT_START text_handler.ignorable_tag_start( v.to_s ) when TAG_NO_TEXT_END text_handler.ignorable_tag_end( v.to_s ) when STRING_DOUBLE_QUOTE text_handler.string_double_quote( v.to_s ) when STRING_SINGLE_QUOTE text_handler.string_single_quote( v.to_s ) when TAG_SELF_CONTAINED text_handler.tag_self_contained( v.to_s ) end end end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With