Surprisingly valid Ruby syntax: % everywhere

Question

In Ruby 2.7 and 3.1 this script does the same thing whether or not the % signs are there:

def count(str)
  state = :start
  tbr = []
  str.each_char do
%  %case state
    when :start
      tbr << 0
  %  %state = :symbol
 %  when :symbol
      tbr << 1
 %  % state = :start
 %  end
  end
  tbr
end

p count("Foobar")

How is this parsed? You can add more % or remove some and it will still work, but not any combination. I found this example through trial and error.

I was teaching someone Ruby and noticed only after their script was working that they had a random % in the margin. I pushed it a little further to see how many it would accept.

Jörg W Mittag · Accepted Answer

Syntax

Percent String Literal

This is a Percent String Literal receiving the message %.

A Percent String Literal has the form:

% character
opening-delimiter
string content
closing-delimiter

If the opening-delimiter is one of <, [, (, or {, then the closing-delimiter must be the corresponding >, ], ), or }. Otherwise, the opening-delimiter can be any arbitrary character and the closing-delimiter must be the same character.

So,

(that is, % SPACE SPACE)

is a Percent String Literal with SPACE as the delimiter and no content. I.e. it is equivalent to "".

Operator Message Send `a % b`

a % b

is equivalent to

a.%(b)

I.e. sending the message % to the result of evaluating the expression a, passing the result of evaluating the expression b as the single argument.

Which means

%  % b

is (roughly) equivalent to

"".%(b)

Argument List

So, what's b then? Well, it's the expression following the % operator (not to be confused with the % sigil of the Percent String Literal).

The entire code is (roughly) equivalent to this:

def count(str)
  state = :start
  tbr = []
  str.each_char do
"".%(case state
    when :start
      tbr << 0
  "".%(state = :symbol)
 ""when :symbol
      tbr << 1
 "".%(state = :start)
 ""end)
  end
  tbr
end

p count("Foobar")

AST

You can figure this out yourself by just asking Ruby:

# ruby --dump=parsetree_with_comment test.rb
###########################################################
## Do NOT use this node dump for any purpose other than  ##
## debug and research.  Compatibility is not guaranteed. ##
###########################################################

# @ NODE_SCOPE (id: 62, line: 1, location: (1,0)-(17,17))
# | # new scope
# | # format: [nd_tbl]: local table, [nd_args]: arguments, [nd_body]: body
# +- nd_tbl (local table): (empty)
# +- nd_args (arguments):
# |   (null node)

[…]

#     |           |       +- nd_body (body):
#     |           |           @ NODE_OPCALL (id: 48, line: 5, location: (5,0)-(12,7))*
#     |           |           | # method invocation
#     |           |           | # format: [nd_recv] [nd_mid] [nd_args]
#     |           |           | # example: foo + bar
#     |           |           +- nd_mid (method id): :%
#     |           |           +- nd_recv (receiver):
#     |           |           |   @ NODE_STR (id: 12, line: 5, location: (5,0)-(5,3))
#     |           |           |   | # string literal
#     |           |           |   | # format: [nd_lit]
#     |           |           |   | # example: 'foo'
#     |           |           |   +- nd_lit (literal): ""
#     |           |           +- nd_args (arguments):
#     |           |               @ NODE_LIST (id: 47, line: 5, location: (5,4)-(12,7))
#     |           |               | # list constructor
#     |           |               | # format: [ [nd_head], [nd_next].. ] (length: [nd_alen])
#     |           |               | # example: [1, 2, 3]
#     |           |               +- nd_alen (length): 1
#     |           |               +- nd_head (element):
#     |           |               |   @ NODE_CASE (id: 46, line: 5, location: (5,4)-(12,7))
#     |           |               |   | # case statement
#     |           |               |   | # format: case [nd_head]; [nd_body]; end
#     |           |               |   | # example: case x; when 1; foo; when 2; bar; else baz; end
#     |           |               |   +- nd_head (case expr):
#     |           |               |   |   @ NODE_DVAR (id: 13, line: 5, location: (5,9)-(5,14))
#     |           |               |   |   | # dynamic variable reference
#     |           |               |   |   | # format: [nd_vid](dvar)
#     |           |               |   |   | # example: 1.times { x = 1; x }
#     |           |               |   |   +- nd_vid (local variable): :state

[…]

Some of the interesting places here are the node at (id: 12, line: 5, location: (5,0)-(5,3)) which is the first string literal, and (id: 48, line: 5, location: (5,0)-(12,7)) which is the first % message send:

#     |           |       +- nd_body (body):
#     |           |           @ NODE_OPCALL (id: 48, line: 5, location: (5,0)-(12,7))*
#     |           |           | # method invocation
#     |           |           | # format: [nd_recv] [nd_mid] [nd_args]
#     |           |           | # example: foo + bar
#     |           |           +- nd_mid (method id): :%
#     |           |           +- nd_recv (receiver):
#     |           |           |   @ NODE_STR (id: 12, line: 5, location: (5,0)-(5,3))
#     |           |           |   | # string literal
#     |           |           |   | # format: [nd_lit]
#     |           |           |   | # example: 'foo'
#     |           |           |   +- nd_lit (literal): ""

Note: this is just the simplest possible method of obtaining a parse tree, which unfortunately contains a lot of internal minutiae that are not really relevant to figuring out what is going on. There are other methods such as the parser gem or its companion ast which produce far more readable results:

# ruby-parse count.rb
(begin
  (def :count
    (args
      (arg :str))
    (begin
      (lvasgn :state
        (sym :start))
      (lvasgn :tbr
        (array))
      (block
        (send
          (lvar :str) :each_char)
        (args)
        (send
          (dstr) :%
          (case
            (lvar :state)
            (when
              (sym :start)
              (begin
                (send
                  (lvar :tbr) :<<
                  (int 0))
                (send
                  (dstr) :%
                  (lvasgn :state
                    (sym :symbol)))
                (dstr)))
            (when
              (sym :symbol)
              (begin
                (send
                  (lvar :tbr) :<<
                  (int 1))
                (send
                  (dstr) :%
                  (lvasgn :state
                    (sym :start)))
                (dstr))) nil)))
      (lvar :tbr)))
  (send nil :p
    (send nil :count
      (str "Foobar"))))

Semantics

So far, all we have talked about is the Syntax, i.e. the grammatical structure of the code. But what does it mean?

The method String#% performs String Formatting a la C's printf family of functions. However, since the format string (the receiver of the % message) is the empty string, the result of the message send is the empty string as well, since there is nothing to format.

If Ruby were a purely functional, lazy, non-strict language, the result would be equivalent to this:

def count(str)
  state = :start
  tbr = []
  str.each_char do
"".%(case state
    when :start
      tbr << 0
  ""
 ""when :symbol
      tbr << 1
 ""
 ""end)
  end
  tbr
end

p count("Foobar")

which in turn is equivalent to this

def count(str)
  state = :start
  tbr = []
  str.each_char do
"".%(case state
    when :start
      tbr << 0
  ""
 when :symbol
      tbr << 1
 ""
 end)
  end
  tbr
end

p count("Foobar")

which is equivalent to this

def count(str)
  state = :start
  tbr = []
  str.each_char do
"".%(case state
    when :start
  ""
 when :symbol
 ""
 end)
  end
  tbr
end

p count("Foobar")

which is equivalent to this

def count(str)
  state = :start
  tbr = []
  str.each_char do
"".%(case state
    when :start, :symbol
 ""
 end)
  end
  tbr
end

p count("Foobar")

which is equivalent to this

def count(str)
  state = :start
  tbr = []
  str.each_char do
""
  end
  tbr
end

p count("Foobar")

which is equivalent to this

def count(str)
  state = :start
  tbr = []
  tbr
end

p count("Foobar")

which is equivalent to this

def count(str)
  []
end

p count("Foobar")

Clearly, that is not what is happening, and the reason is that Ruby isn't a purely functional, lazy, non-strict language. While the arguments which are passed to the % message sends are irrelevant to the result of the message send, they are nevertheless evaluated (because Ruby is strict and eager) and they have side-effects (because Ruby is not purely functional), i.e. their side-effects of re-assigning variables and mutating the tbr result array are still executed.

If this code were written in a more Ruby-like style with less mutation and fewer side-effects and instead using functional transformations, then arbitrarily replacing results with empty strings would immediately break it. The only reason there is no effect here is because the abundant use of side-effects and mutation.

Surprisingly valid Ruby syntax: % everywhere

Tags:

ruby

Max

1 Answers

Syntax

Percent String Literal

Operator Message Send `a % b`

Argument List

AST

Semantics

Jörg W Mittag

Recent Activity

Donate For Us

Surprisingly valid Ruby syntax: % everywhere

Tags:

ruby

Max

1 Answers

Syntax

Percent String Literal

Operator Message Send a % b

Argument List

AST

Semantics

Jörg W Mittag

Related questions

Recent Activity

Donate For Us

Operator Message Send `a % b`