Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Library to parse ERB files

I am attempting to parse, not evaluate, rails ERB files in a Hpricot/Nokogiri type manner. The files I am attempting to parse contain HTML fragments intermixed with dynamic content generated using ERB (standard rails view files) I am looking for a library that will not only parse the surrounding content, much the way that Hpricot or Nokogiri will but will also treat the ERB symbols, <%, <%= etc, as though they were html/xml tags.

Ideally I would get back a DOM like structure where the <%, <%= etc symbols would be included as their own node types.

I know that it is possible to hack something together using regular expressions but I was looking for something a bit more reliable as I am developing a tool that I need to run on a very large view code base where both the html content and the erb content are important.

For example, content such as:

blah blah blah
<div>My Great Text &lt%= my_dynamic_expression %></div>

Would return a tree structure like:

root
 - text_node (blah blah blah)
 - element (div)
    - text_node (My Great Text )
        - erb_node (<%=)
like image 718
Douglas Sellers Avatar asked Apr 06 '10 23:04

Douglas Sellers


People also ask

Does RuboCop format ERB?

Rubocop. Runs RuboCop on all ruby statements found in ERB templates.

What is ERB file in Ruby?

ERB (Embedded RuBy) is a feature of Ruby that enables you to conveniently generate any kind of text, in any quantity, from templates. The templates themselves combine plain text with Ruby code for variable substitution and flow control, which makes them easy to write and maintain.


1 Answers

I eventually ended up solving this problem by using RLex, http://raa.ruby-lang.org/project/ruby-lex/, the ruby version of lex with the following grammer:

%{

#define NUM 257

#define OPTOK 258
#define IDENT 259
#define OPETOK 260
#define CLSTOK 261
#define CLTOK 262
#define FLOAT 263
#define FIXNUM 264
#define WORD 265
#define STRING_DOUBLE_QUOTE 266
#define STRING_SINGLE_QUOTE 267

#define TAG_START 268
#define TAG_END 269
#define TAG_SELF_CONTAINED 270
#define ERB_BLOCK_START 271
#define ERB_BLOCK_END 272
#define ERB_STRING_START 273
#define ERB_STRING_END 274
#define TAG_NO_TEXT_START 275
#define TAG_NO_TEXT_END 276
#define WHITE_SPACE 277
%}

digit   [0-9]
blank   [ ]
letter  [A-Za-z]
name1   [A-Za-z_]
name2   [A-Za-z_0-9]
valid_tag_character [A-Za-z0-9"'=@_():/ ] 
ignore_tags style|script
%%

{blank}+"\n"                  { return [ WHITE_SPACE, yytext ] } 
"\n"{blank}+                  { return [ WHITE_SPACE, yytext ] } 
{blank}+"\n"{blank}+                  { return [ WHITE_SPACE, yytext ] } 

"\r"                  { return [ WHITE_SPACE, yytext ] } 
"\n"            { return[ yytext[0], yytext[0..0] ] };
"\t"            { return[ yytext[0], yytext[0..0] ] };

^{blank}+       { return [ WHITE_SPACE, yytext ] }

{blank}+$       { return [ WHITE_SPACE, yytext ] };

""   { return [ TAG_NO_TEXT_START, yytext ] }
""  { return [ TAG_NO_TEXT_END, yytext ] }
""                   { return [ TAG_SELF_CONTAINED, yytext ] }
""  { return [ TAG_SELF_CONTAINED, yytext ] }
""    { return [ TAG_START, yytext ] }
""   { return [ TAG_END, yytext ] }

""  { return [ ERB_BLOCK_END, yytext ] }
""  { return [ ERB_STRING_END, yytext ] }


{letter}+       { return [ WORD, yytext ] }


\".*\"          { return [ STRING_DOUBLE_QUOTE, yytext ] }
'.*'                    { return [ STRING_SINGLE_QUOTE, yytext ] }
.           { return [ yytext[0], yytext[0..0] ] }

%%

This is not a complete grammer but for my purposes, locating and re-emitting text, it worked. I combined that grammer with this small piece of code:

    text_handler = MakeYourOwnCallbackHandler.new

    l = Erblex.new
    l.yyin = File.open(file_name, "r")

    loop do
      a,v = l.yylex
      break if a == 0

      if( a < WORD )
        text_handler.character( v.to_s, a )
      else
        case a
        when WORD
          text_handler.text( v.to_s )
        when TAG_START
          text_handler.start_tag( v.to_s )
        when TAG_END
          text_handler.end_tag( v.to_s )
        when WHITESPACE
          text_handler.white_space( v.to_s )
        when ERB_BLOCK_START
          text_handler.erb_block_start( v.to_s )
        when ERB_BLOCK_END
          text_handler.erb_block_end( v.to_s )      
        when ERB_STRING_START
          text_handler.erb_string_start( v.to_s )
        when ERB_STRING_END
          self.text_handler.erb_string_end( v.to_s )
        when TAG_NO_TEXT_START
          text_handler.ignorable_tag_start( v.to_s )
        when TAG_NO_TEXT_END
          text_handler.ignorable_tag_end( v.to_s )
        when STRING_DOUBLE_QUOTE
          text_handler.string_double_quote( v.to_s )
        when STRING_SINGLE_QUOTE
          text_handler.string_single_quote( v.to_s )
        when TAG_SELF_CONTAINED
          text_handler.tag_self_contained( v.to_s )
        end
      end  
    end
like image 113
Douglas Sellers Avatar answered Sep 22 '22 07:09

Douglas Sellers