Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does `.each` return nil when called on a lazy enum after `.select`?

Tags:

enums

ruby

I have a snippet of code that goes like this:

sent_messages = messages.lazy.reject { |m| message_is_spam?(m) }
                             .each   { |m| send_message(m) }
# Do something with sent_messages...

Some context: the message_is_spam? method returns true if the recipient of the message was messaged within the last 5 minutes. When messages contains several messages for the same recipient, the latter message will be considered spam only after the first message is sent. To ensure the latter message is considered spam, I lazily reject spam messages and send them.

I expect .each to return an array containing all items, but instead I get nil. .each always returns an array, except in this one scenario:

[].each {}                # => []
[].lazy.each {}           # => []
[].select {}.each {}      # => []
[].lazy.select {}.each {} # => nil

To add to the confusion, JRuby returns [] in all of the examples above.

Why does .each return nil when called like this? I can't find anything in the docs about it, and it's difficult to figure out what's going on in the C-code.

I've already figured out a way to completely bypass this issue; if I select up to 1 message per recipient (messages.uniq_by(&:recipient)), the operation no longer needs to be lazy. Nonetheless, this still surprises me.

like image 635
Joep Avatar asked Apr 19 '17 15:04

Joep


Video Answer


1 Answers

Possible explanation

One of the purposes of Enumerator::Lazy is to avoid having a huge (or possibly infinite) array in memory. This could explain why Enumerator#each doesn't return the desired array.

Instead of risking running out of memory with a huge array, methods like Lazy#reject prefer returning nil as an alternative value (the one returned by each afterwards) :

return lazy_add_method(obj, 0, 0, Qnil, Qnil, &lazy_reject_funcs);

In comparison, Enumerable#lazy returns :

VALUE result = lazy_to_enum_i(obj, sym_each, 0, 0, lazyenum_size);

I suspect that the distinct arguments :

  • Qnil for reject
  • sym_each for lazy

are the reason why :

  • [].lazy.each {} returns []
  • [].lazy.select{}.each {} returns nil.

Still, it doesn't seem consistent for each to return an array or nil .

Alternatives

each

A more verbose alternative for your code could be :

messages = %w(a b c)
messages_to_send = messages.lazy.reject{|x| puts "Is '#{x}' spam?"}
messages_to_send.each{ |m| puts "Send '#{m}'" }
# Is 'a' spam?
# Send 'a'
# Is 'b' spam?
# Send 'b'
# Is 'c' spam?
# Send 'c'

Lazy#reject returns a Lazy Enumerator, so the second message_is_spam? will be executed after the first send_message.

There's one problem though, calling to_a on the lazy enumerator will call reject again :

sent_messages = messages_to_send.to_a
# Is 'a' spam?
# Is 'b' spam?
# Is 'c' spam?

map and modified method

You could also return m at the end of send_message and use Lazy#map:

sent_messages = messages.lazy.reject { |m| message_is_spam?(m) }
                             .map { |m| send_message(m) }.to_a

map should reliably return the desired Enumerator::Lazy object. Calling Enumerable#to_a ensures that sent_messages is an array.

map and explicit return

If you don't want to modify send_message, you could return m explicitely at the end of each map iteration :

messages = %w(a b c)

sent_messages = messages.lazy.reject{ |m| puts "Is '#{m}' spam?" }
                             .map{ |m| puts "Send '#{m}'"; m }.to_a   
# Is 'a' spam?
# Send 'a'
# Is 'b' spam?
# Send 'b'
# Is 'c' spam?
# Send 'c'

p sent_messages
# ["a", "b", "c"]

Modified logic

Yet another alternative would be to redefine your logic without lazy :

sent_messages = messages.map do |m|
  next if message_is_spam?(m)
  send_message(m)
  m
end.compact
like image 95
Eric Duminil Avatar answered Dec 21 '22 23:12

Eric Duminil