I have a snippet of code that goes like this:
sent_messages = messages.lazy.reject { |m| message_is_spam?(m) }
.each { |m| send_message(m) }
# Do something with sent_messages...
Some context: the message_is_spam?
method returns true if the recipient of the message was messaged within the last 5 minutes. When messages
contains several messages for the same recipient, the latter message will be considered spam only after the first message is sent. To ensure the latter message is considered spam, I lazily reject spam messages and send them.
I expect .each
to return an array containing all items, but instead I get nil
. .each
always returns an array, except in this one scenario:
[].each {} # => []
[].lazy.each {} # => []
[].select {}.each {} # => []
[].lazy.select {}.each {} # => nil
To add to the confusion, JRuby returns []
in all of the examples above.
Why does .each
return nil when called like this? I can't find anything in the docs about it, and it's difficult to figure out what's going on in the C-code.
I've already figured out a way to completely bypass this issue; if I select up to 1 message per recipient (messages.uniq_by(&:recipient)
), the operation no longer needs to be lazy. Nonetheless, this still surprises me.
One of the purposes of Enumerator::Lazy
is to avoid having a huge (or possibly infinite) array in memory. This could explain why Enumerator#each
doesn't return the desired array.
Instead of risking running out of memory with a huge array, methods like Lazy#reject
prefer returning nil
as an alternative value (the one returned by each
afterwards) :
return lazy_add_method(obj, 0, 0, Qnil, Qnil, &lazy_reject_funcs);
In comparison, Enumerable#lazy
returns :
VALUE result = lazy_to_enum_i(obj, sym_each, 0, 0, lazyenum_size);
I suspect that the distinct arguments :
Qnil
for reject
sym_each
for lazy
are the reason why :
[].lazy.each {}
returns []
[].lazy.select{}.each {}
returns nil
.Still, it doesn't seem consistent for each
to return an array or nil
.
A more verbose alternative for your code could be :
messages = %w(a b c)
messages_to_send = messages.lazy.reject{|x| puts "Is '#{x}' spam?"}
messages_to_send.each{ |m| puts "Send '#{m}'" }
# Is 'a' spam?
# Send 'a'
# Is 'b' spam?
# Send 'b'
# Is 'c' spam?
# Send 'c'
Lazy#reject
returns a Lazy
Enumerator, so the second message_is_spam?
will be executed after the first send_message
.
There's one problem though, calling to_a
on the lazy enumerator will call reject
again :
sent_messages = messages_to_send.to_a
# Is 'a' spam?
# Is 'b' spam?
# Is 'c' spam?
map
and modified methodYou could also return m
at the end of send_message
and use Lazy#map
:
sent_messages = messages.lazy.reject { |m| message_is_spam?(m) }
.map { |m| send_message(m) }.to_a
map
should reliably return the desired Enumerator::Lazy object. Calling Enumerable#to_a
ensures that sent_messages
is an array.
map
and explicit returnIf you don't want to modify send_message
, you could return m
explicitely at the end of each map
iteration :
messages = %w(a b c)
sent_messages = messages.lazy.reject{ |m| puts "Is '#{m}' spam?" }
.map{ |m| puts "Send '#{m}'"; m }.to_a
# Is 'a' spam?
# Send 'a'
# Is 'b' spam?
# Send 'b'
# Is 'c' spam?
# Send 'c'
p sent_messages
# ["a", "b", "c"]
Yet another alternative would be to redefine your logic without lazy
:
sent_messages = messages.map do |m|
next if message_is_spam?(m)
send_message(m)
m
end.compact
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With