Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby forgets local variables during a while loop?

I'm processing a record-based text file: so I'm looking for a starting string which constitutes the start of a record: there is no end-of-record marker, so I use the start of the next record to delimit the last record.

So I have built a simple program to do this, but I see something which suprises me: it looks like Ruby is forgetting local variables exist - or have I found a programming error ? [although I don't think I have : if I define the variable 'message' before my loop I don't see the error].

Here's a simplified example with example input data and error message in comments:

flag=false
# message=nil # this is will prevent the issue.
while line=gets do
    if line =~/hello/ then
        if flag==true then
            puts "#{message}"
        end
        message=StringIO.new(line);
        puts message
        flag=true
    else
        message << line
    end
end

# Input File example:
# hello this is a record
# this is also part of the same record
# hello this is a new record
# this is still record 2
# hello this is record 3 etc etc
# 
# Error when running: [nb, first iteration is fine]
# <StringIO:0x2e845ac>
# hello
# test.rb:5: undefined local variable or method `message' for main:Object (NameError)
#
like image 350
monojohnny Avatar asked Oct 31 '09 15:10

monojohnny


2 Answers

From the Ruby Programming Language:

alt text http://bks0.books.google.com/books?id=jcUbTcr5XWwC&printsec=frontcover&img=1&zoom=5&sig=ACfU3U1rnYKha_p7vEkpPm1Ow3o9RAM0nQ

Blocks and Variable Scope

Blocks define a new variable scope: variables created within a block exist only within that block and are undefined outside of the block. Be cautious, however; the local variables in a method are available to any blocks within that method. So if a block assigns a value to a variable that is already defined outside of the block, this does not create a new block-local variable but instead assigns a new value to the already-existing variable. Sometimes, this is exactly the behavior we want:

total = 0   
data.each {|x| total += x }  # Sum the elements of the data array
puts total                   # Print out that sum

Sometimes, however, we do not want to alter variables in the enclosing scope, but we do so inadvertently. This problem is a particular concern for block parameters in Ruby 1.8. In Ruby 1.8, if a block parameter shares the name of an existing variable, then invocations of the block simply assign a value to that existing variable rather than creating a new block-local variable. The following code, for example, is problematic because it uses the same identifier i as the block parameter for two nested blocks:

1.upto(10) do |i|         # 10 rows
  1.upto(10) do |i|       # Each has 10 columns
    print "#{i} "         # Print column number
  end
  print " ==> Row #{i}\n" # Try to print row number, but get column number
end

Ruby 1.9 is different: block parameters are always local to their block, and invocations of the block never assign values to existing variables. If Ruby 1.9 is invoked with the -w flag, it will warn you if a block parameter has the same name as an existing variable. This helps you avoid writing code that runs differently in 1.8 and 1.9.

Ruby 1.9 is different in another important way, too. Block syntax has been extended to allow you to declare block-local variables that are guaranteed to be local, even if a variable by the same name already exists in the enclosing scope. To do this, follow the list of block parameters with a semicolon and a comma-separated list of block local variables. Here is an example:

x = y = 0            # local variables
1.upto(4) do |x;y|   # x and y are local to block
                     # x and y "shadow" the outer variables
  y = x + 1          # Use y as a scratch variable
  puts y*y           # Prints 4, 9, 16, 25
end
[x,y]                # => [0,0]: block does not alter these

In this code, x is a block parameter: it gets a value when the block is invoked with yield. y is a block-local variable. It does not receive any value from a yield invocation, but it has the value nil until the block actually assigns some other value to it. The point of declaring these block-local variables is to guarantee that you will not inadvertently clobber the value of some existing variable. (This might happen if a block is cut-and-pasted from one method to another, for example.) If you invoke Ruby 1.9 with the -w option, it will warn you if a block-local variable shadows an existing variable.

Blocks can have more than one parameter and more than one local variable, of course. Here is a block with two parameters and three local variables:

hash.each {|key,value; i,j,k| ... }
like image 60
JRL Avatar answered Oct 19 '22 04:10

JRL


Contrary to some of the other answers, while loops don't actually create a new scope. The problem you're seeing is more subtle.

Prerequisite knowledge: a brief scoping demo

To help show the contrast, blocks passed to a method call DO create a new scope, such that a newly assigned local variable inside the block disappears after the block exits:

### block example - provided for contrast only ###
[0].each {|e| blockvar = e }
p blockvar  # NameError: undefined local variable or method

But while loops (like your case) are different, because a variable defined in the loop will persist:

arr = [0]
while arr.any?
  whilevar = arr.shift
end
p whilevar  # prints 0

Summary of the "problem"

The reason you get the error in your case is because the line that uses message:

puts "#{message}"

appears before any code that assigns message.

It's the same reason this code raises an error if a wasn't defined beforehand:

# Note the single (not double) equal sign.
# At first glance it looks like this should print '1',
#  because the 'a' is assigned before (time-wise) the puts.
puts a if a = 1

Not Scoping, but Parsing-visibility

The so-called "problem" - i.e. local-variable visibility within a single scope - is due to ruby's parser. Since we're only considering a single scope, scoping rules have nothing to do with the problem. At the parsing stage, the parser decides at which source locations a local variable is visible, and this visibility does not change during execution.

When determining if a local variable is defined (i.e. defined? returns true) at any point in the code, the parser checks the current scope to see if any code has assigned it before, even if that code has never run (the parser can't know anything about what has or hasn't run at the parsing stage). "Before" meaning: on a line above, or on the same line and to the left-hand side.

An exercise to determine if a local is defined (i.e. visible)

Note that the following only applies to local variables, not methods. (Determining whether a method is defined in a scope is more complex, because it involves searching included modules and ancestor classes.)

A concrete way to see the local variable behavior is to open your file in a text editor. Suppose also that by repeatedly pressing the left-arrow key, you can move your cursor backward through the entire file. Now suppose you're wondering whether a certain usage of message will raise the NameError. To do this, position your cursor at the place you're using message, then keep pressing left-arrow until you either:

  1. reach the beginning of the current scope (you must understand ruby's scoping rules in order to know when this happens)
  2. reach code that assigns message

If you've reached an assignment before reaching the scope boundary, that means your usage of message won't raise NameError. If you don't reach any assignment, the usage will raise NameError.

Other considerations

In the case a variable assignment appears in the code but isn't run, the variable is initialized to nil:

# a is not defined before this
if false
  # never executed, but makes the binding defined/visible to the else case
  a = 1
else
  p a  # prints nil
end 

While loop test case

Here's a small test case to demonstrate the oddness of the above behavior when it happens in a while loop. The affected variable here is dest_arr.

arr = [0,1]
while n = arr.shift
  p( n: n, dest_arr_defined: (defined? dest_arr) )

  if n == 0
    dest_arr = [n]
  else
    dest_arr << n
    p( dest_arr: dest_arr )
  end
end

which outputs:

{:n=>0, :dest_arr_defined=>nil}
{:n=>1, :dest_arr_defined=>nil}
{:dest_arr=>[0, 1]}

The salient points:

  • The first iteration is intuitive, dest_arr is initialized to [0].
  • But we need to pay close attention in the second iteration (when n is 1):
    • At the beginning, dest_arr is undefined!
    • But when the code reaches the else case, dest_arr is visible again, because the interpreter sees that it was assigned 2 lines above.
    • Notice also, that dest_arr is only hidden at the start of the loop; its value is never lost (as evidenced by the final contents being [0, 1]).

This also explains why assigning your local before the while loop fixes the problem. The assignment doesn't need to be executed; it only needs to appear in the source code.

Lambda example

f1 = ->{ f2 }
f2 = ->{ f1 }
p f2.call()

# The following fails because the body of f1 tries to access f2 before an assignment for f2 was seen by the parser.
p f1.call()  # undefined local variable or method `f2'.

Fix this by putting an f2 assignment before f1's body. Remember, the assignment doesn't actually need to be executed!

f2 = nil  # Could be replaced by: if false; f2 = nil; end
f1 = ->{ f2 }
f2 = ->{ f1 }
p f2.call()
p f1.call()  # ok

Method-masking gotcha

Things get really hairy if you have a local variable with the same name as a method:

def dest_arr
  :whoops
end

arr = [0,1]
while n = arr.shift
  p( n: n, dest_arr: dest_arr )

  if n == 0
    dest_arr = [n]
  else
    dest_arr << n
    p( dest_arr: dest_arr )
  end
end

Outputs:

{:n=>0, :dest_arr=>:whoops}
{:n=>1, :dest_arr=>:whoops}
{:dest_arr=>[0, 1]}

A local variable assignment in a scope will "mask"/"shadow" a method call of the same name. (You can still call the method by using explicit parentheses or an explicit receiver.) So this is similar to the previous while loop test, except that instead of becoming undefined above the assignment code, the dest_arr method becomes "unmasked"/"unshadowed" so that the method is callable w/o parentheses. But any code after the assignment will see the local variable.

Some best-practices we can derive from all this

  • Don't name local variables the same as method names in the same scope
  • Don't put the initial assignment of a local variable inside the body of a while or for loop, or anything that causes execution to jump around within a scope (calling lambdas or Continuation#call can do this too). Put the assignment before the loop.
like image 28
Kelvin Avatar answered Oct 19 '22 05:10

Kelvin