I'm processing a record-based text file: so I'm looking for a starting string which constitutes the start of a record: there is no end-of-record marker, so I use the start of the next record to delimit the last record.
So I have built a simple program to do this, but I see something which suprises me: it looks like Ruby is forgetting local variables exist - or have I found a programming error ? [although I don't think I have : if I define the variable 'message' before my loop I don't see the error].
Here's a simplified example with example input data and error message in comments:
flag=false
# message=nil # this is will prevent the issue.
while line=gets do
if line =~/hello/ then
if flag==true then
puts "#{message}"
end
message=StringIO.new(line);
puts message
flag=true
else
message << line
end
end
# Input File example:
# hello this is a record
# this is also part of the same record
# hello this is a new record
# this is still record 2
# hello this is record 3 etc etc
#
# Error when running: [nb, first iteration is fine]
# <StringIO:0x2e845ac>
# hello
# test.rb:5: undefined local variable or method `message' for main:Object (NameError)
#
From the Ruby Programming Language:
alt text http://bks0.books.google.com/books?id=jcUbTcr5XWwC&printsec=frontcover&img=1&zoom=5&sig=ACfU3U1rnYKha_p7vEkpPm1Ow3o9RAM0nQ
Blocks define a new variable scope: variables created within a block exist only within that block and are undefined outside of the block. Be cautious, however; the local variables in a method are available to any blocks within that method. So if a block assigns a value to a variable that is already defined outside of the block, this does not create a new block-local variable but instead assigns a new value to the already-existing variable. Sometimes, this is exactly the behavior we want:
total = 0
data.each {|x| total += x } # Sum the elements of the data array
puts total # Print out that sum
Sometimes, however, we do not want to alter variables in the enclosing scope, but we do so inadvertently. This problem is a particular concern for block parameters in Ruby 1.8. In Ruby 1.8, if a block parameter shares the name of an existing variable, then invocations of the block simply assign a value to that existing variable rather than creating a new block-local variable. The following code, for example, is problematic because it uses the same identifier i as the block parameter for two nested blocks:
1.upto(10) do |i| # 10 rows
1.upto(10) do |i| # Each has 10 columns
print "#{i} " # Print column number
end
print " ==> Row #{i}\n" # Try to print row number, but get column number
end
Ruby 1.9 is different: block parameters are always local to their block, and invocations of the block never assign values to existing variables. If Ruby 1.9 is invoked with the -w flag, it will warn you if a block parameter has the same name as an existing variable. This helps you avoid writing code that runs differently in 1.8 and 1.9.
Ruby 1.9 is different in another important way, too. Block syntax has been extended to allow you to declare block-local variables that are guaranteed to be local, even if a variable by the same name already exists in the enclosing scope. To do this, follow the list of block parameters with a semicolon and a comma-separated list of block local variables. Here is an example:
x = y = 0 # local variables
1.upto(4) do |x;y| # x and y are local to block
# x and y "shadow" the outer variables
y = x + 1 # Use y as a scratch variable
puts y*y # Prints 4, 9, 16, 25
end
[x,y] # => [0,0]: block does not alter these
In this code, x is a block parameter: it gets a value when the block is invoked with yield. y is a block-local variable. It does not receive any value from a yield invocation, but it has the value nil until the block actually assigns some other value to it. The point of declaring these block-local variables is to guarantee that you will not inadvertently clobber the value of some existing variable. (This might happen if a block is cut-and-pasted from one method to another, for example.) If you invoke Ruby 1.9 with the -w option, it will warn you if a block-local variable shadows an existing variable.
Blocks can have more than one parameter and more than one local variable, of course. Here is a block with two parameters and three local variables:
hash.each {|key,value; i,j,k| ... }
Contrary to some of the other answers, while
loops don't actually create a new scope. The problem you're seeing is more subtle.
Prerequisite knowledge: a brief scoping demo
To help show the contrast, blocks passed to a method call DO create a new scope, such that a newly assigned local variable inside the block disappears after the block exits:
### block example - provided for contrast only ###
[0].each {|e| blockvar = e }
p blockvar # NameError: undefined local variable or method
But while
loops (like your case) are different, because a variable defined in the loop will persist:
arr = [0]
while arr.any?
whilevar = arr.shift
end
p whilevar # prints 0
Summary of the "problem"
The reason you get the error in your case is because the line that uses message
:
puts "#{message}"
appears before any code that assigns message
.
It's the same reason this code raises an error if a
wasn't defined beforehand:
# Note the single (not double) equal sign.
# At first glance it looks like this should print '1',
# because the 'a' is assigned before (time-wise) the puts.
puts a if a = 1
Not Scoping, but Parsing-visibility
The so-called "problem" - i.e. local-variable visibility within a single scope - is due to ruby's parser. Since we're only considering a single scope, scoping rules have nothing to do with the problem. At the parsing stage, the parser decides at which source locations a local variable is visible, and this visibility does not change during execution.
When determining if a local variable is defined (i.e. defined?
returns true) at any point in the code, the parser checks the current scope to see if any code has assigned it before, even if that code has never run (the parser can't know anything about what has or hasn't run at the parsing stage). "Before" meaning: on a line above, or on the same line and to the left-hand side.
An exercise to determine if a local is defined (i.e. visible)
Note that the following only applies to local variables, not methods. (Determining whether a method is defined in a scope is more complex, because it involves searching included modules and ancestor classes.)
A concrete way to see the local variable behavior is to open your file in a text editor. Suppose also that by repeatedly pressing the left-arrow key, you can move your cursor backward through the entire file. Now suppose you're wondering whether a certain usage of message
will raise the NameError
. To do this, position your cursor at the place you're using message
, then keep pressing left-arrow until you either:
message
If you've reached an assignment before reaching the scope boundary, that means your usage of message
won't raise NameError
. If you don't reach any assignment, the usage will raise NameError
.
Other considerations
In the case a variable assignment appears in the code but isn't run, the variable is initialized to nil
:
# a is not defined before this
if false
# never executed, but makes the binding defined/visible to the else case
a = 1
else
p a # prints nil
end
While loop test case
Here's a small test case to demonstrate the oddness of the above behavior when it happens in a while loop. The affected variable here is dest_arr
.
arr = [0,1]
while n = arr.shift
p( n: n, dest_arr_defined: (defined? dest_arr) )
if n == 0
dest_arr = [n]
else
dest_arr << n
p( dest_arr: dest_arr )
end
end
which outputs:
{:n=>0, :dest_arr_defined=>nil}
{:n=>1, :dest_arr_defined=>nil}
{:dest_arr=>[0, 1]}
The salient points:
dest_arr
is initialized to [0]
.n
is 1
):
dest_arr
is undefined!else
case, dest_arr
is visible again, because the interpreter sees that it was assigned 2 lines above.dest_arr
is only hidden at the start of the loop; its value is never lost (as evidenced by the final contents being [0, 1]
).This also explains why assigning your local before the while
loop fixes the problem. The assignment doesn't need to be executed; it only needs to appear in the source code.
Lambda example
f1 = ->{ f2 }
f2 = ->{ f1 }
p f2.call()
# The following fails because the body of f1 tries to access f2 before an assignment for f2 was seen by the parser.
p f1.call() # undefined local variable or method `f2'.
Fix this by putting an f2
assignment before f1
's body. Remember, the assignment doesn't actually need to be executed!
f2 = nil # Could be replaced by: if false; f2 = nil; end
f1 = ->{ f2 }
f2 = ->{ f1 }
p f2.call()
p f1.call() # ok
Method-masking gotcha
Things get really hairy if you have a local variable with the same name as a method:
def dest_arr
:whoops
end
arr = [0,1]
while n = arr.shift
p( n: n, dest_arr: dest_arr )
if n == 0
dest_arr = [n]
else
dest_arr << n
p( dest_arr: dest_arr )
end
end
Outputs:
{:n=>0, :dest_arr=>:whoops}
{:n=>1, :dest_arr=>:whoops}
{:dest_arr=>[0, 1]}
A local variable assignment in a scope will "mask"/"shadow" a method call of the same name. (You can still call the method by using explicit parentheses or an explicit receiver.) So this is similar to the previous while
loop test, except that instead of becoming undefined above the assignment code, the dest_arr
method becomes "unmasked"/"unshadowed" so that the method is callable w/o parentheses. But any code after the assignment will see the local variable.
Some best-practices we can derive from all this
while
or for
loop, or anything that causes execution to jump around within a scope (calling lambdas or Continuation#call
can do this too). Put the assignment before the loop.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With