Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is EventMachine's defer slower than a Ruby Thread?

I have two scripts which use Mechanize to fetch a Google index page. I assumed EventMachine will be faster than a Ruby thread, but it's not.

EventMachine code costs: "0.24s user 0.08s system 2% cpu 12.682 total"

Ruby Thread code costs: "0.22s user 0.08s system 5% cpu 5.167 total "

Am I using EventMachine in the wrong way?

EventMachine:

require 'rubygems'
require 'mechanize'
require 'eventmachine'

trap("INT") {EM.stop}

EM.run do 
  num = 0
  operation = proc {
    agent = Mechanize.new
    sleep 1
    agent.get("http://google.com").body.to_s.size
  }
  callback = proc { |result|
    sleep 1
    puts result
    num+=1
    EM.stop if num == 9
  }

  10.times do 
    EventMachine.defer operation, callback
  end
end

Ruby Thread:

require 'rubygems'
require 'mechanize'


threads = []
10.times do 
  threads << Thread.new do 
    agent = Mechanize.new
    sleep 1
    puts agent.get("http://google.com").body.to_s.size
    sleep 1
  end
end


threads.each do |aThread| 
  aThread.join
end
like image 811
allenwei Avatar asked Jun 17 '10 23:06

allenwei


4 Answers

All of the answers in this thread are missing one key point: your callbacks are being run inside the reactor thread instead of in a separate deferred thread. Running Mechanize requests in a defer call is the right way to keep from blocking the loop, but you have to be careful that your callback does not also block the loop.

When you run EM.defer operation, callback, the operation is run inside a Ruby-spawned thread, which does the work, and then the callback is issued inside the main loop. Therefore, the sleep 1 in operation runs in parallel, but the callback runs serially. This explains the near 9-second difference in run time.

Here's a simplified version of the code you are running.

EM.run {
  times = 0

  work = proc { sleep 1 }

  callback = proc {
    sleep 1
    EM.stop if (times += 1) >= 10
  }

  10.times { EM.defer work, callback }
}

This takes about 12 seconds, which is 1 second for the parallel sleeps, 10 seconds for the serial sleeps, and 1 second for overhead.

To run the callback code in parallel, you have to spawn new threads for it using a proxy callback that uses EM.defer like so:

EM.run {
  times = 0

  work = proc { sleep 1 }

  callback = proc {
    sleep 1
    EM.stop if (times += 1) >= 10
  }

  proxy_callback = proc { EM.defer callback }

  10.times { EM.defer work, proxy_callback }
}

However, you may run into issues with this if your callback is then supposed to execute code within the event loop, because it is run inside a separate, deferred thread. If this happens, move the problem code into the callback of the proxy_callback proc.

EM.run {
  times = 0

  work = proc { sleep 1 }

  callback = proc {
    sleep 1
    EM.stop_event_loop if (times += 1) >= 5
  }

  proxy_callback = proc { EM.defer callback, proc { "do_eventmachine_stuff" } }

  10.times { EM.defer work, proxy_callback }
}

This version ran in about 3 seconds, which accounts for 1 second of sleeping for operation in parallel, 1 second of sleeping for callback in parallel and 1 second for overhead.

like image 104
Benjamin Manns Avatar answered Nov 13 '22 06:11

Benjamin Manns


Yep, you're using it wrong. EventMachine works by making asynchronous IO calls that return immediately and notify the "reactor" (the event loop started by EM.run) when they are completed. You have two blocking calls that defeat the purpose of the system, sleep and Mechanize.get. You have to use special asynchronous/non-blocking libraries to derive any value from EventMachine.

like image 9
Ben Hughes Avatar answered Nov 13 '22 07:11

Ben Hughes


You should use something like em-http-request http://github.com/igrigorik/em-http-request

like image 7
Edgar Gonzalez Avatar answered Nov 13 '22 06:11

Edgar Gonzalez


EventMachine "defer" actually spawns Ruby threads from a threadpool it manages to handle your request. Yes, EventMachine is designed for non-blocking IO operations, but the defer command is an exception - it's designed to allow you to do long running operations without blocking the reactor.

So, it's going to be a little slower then naked threads, because really it's just launching threads with the overhead of EventMachine's threadpool manager.

You can read more about defer here: http://eventmachine.rubyforge.org/EventMachine.html#M000486

That said, fetching pages is a great use of EventMachine, but as other posters have said, you need to use a non-blocking IO library, and then use next_tick or similar to start your tasks, rather then defer, which breaks your task out of the reactor loop.

like image 2
Joshua Avatar answered Nov 13 '22 05:11

Joshua