Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read file in EventMachine asynchronously

I'm playing with Ruby EventMachines for some time now and I think I'm understandings its basics.

However, I am not sure how to read in a large file (120 MB) performantly. My goal is to read a file line by line and write every line into a Cassandra database (same should be with MySQL, PostgreSQL, MongoDB etc. because the Cassandra client supports EM explicitly). The simple snippet blocks the reactor, right?

require 'rubygems'
require 'cassandra'
require 'thrift_client/event_machine'

EM.run do
  Fiber.new do
    rm = Cassandra.new('RankMetrics', "127.0.0.1:9160", :transport => Thrift::EventMachineTransport, :transport_wrapper => nil)
    rm.clear_keyspace!
    begin
      file = File.new("us_100000.txt", "r")
    while (line = file.gets)
      rm.insert(:Domains, "#{line.downcase}", {'domain' => "#{line}"})
    end
      file.close
    rescue => err
      puts "Exception: #{err}"
      err
    end
    EM.stop
  end.resume
end

But what's the right way to get a file read asynchronously?

like image 807
ctp Avatar asked Oct 14 '11 18:10

ctp


2 Answers

There is no asynchronous file IO support in EventMachine, the best way to achieve what you're trying to do is to read a couple of lines on each tick and send them off to the database. The most important is to not read too large chunks since that would block the reactor.

EM.run do
  io = File.open('path/to/file')
  read_chunk = proc do
    lines_sent = 10
    10.times do
      if line = io.gets
        send_to_db(line) do
          # when the DB call is done
          lines_sent -= 1
          EM.next_tick(read_chunk) if lines_sent == 0
        end
      else
        EM.stop
      end
    end
  end
  EM.next_tick(read_chunk)
end

See What is the best way to read files in an EventMachine-based app?

like image 135
Theo Avatar answered Nov 12 '22 21:11

Theo


If you haven't already, you might take a look at EM::FileStreamer. For one thing, FileStreamer uses a C++ based 'fast file reader'. Couldn't you stream the file over a local socket/pipe and handle the sending to db in a separate process that's listening on the other end?

Also there is a non-Fiber based example of handling sync db connections gracefully in ThreadedResource, in case that's helpful...specifically mentions Cassandra. Although it sounds like your Cassandra library is Fiber based.

like image 1
Eric G Avatar answered Nov 12 '22 21:11

Eric G