Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Line-oriented streaming in Ruby (like grep)

Tags:

ruby

pipe

By default Ruby opens $stdin and $stdout in buffered mode. This means you can't use Ruby to perform a grep-like operation filtering text. Is there any way to force Ruby to use line-oriented mode? I've seen various solutions including popen3 (which does buffered-mode only) and pty (which doesn't separately handle $stdout and $stderr, which I require).

How do I do this? Python seems to have the same lack.

like image 352
Peter Avatar asked Aug 03 '11 15:08

Peter


People also ask

How do streaming files work in Ruby?

To summarize, streaming files works by asking the operating system's kernel to open a file, then read bytes from it bit by bit. When reading a file per line in Ruby, data is taken from the file 512 bytes at a time and split up in "lines" after that.

How to read a file per line in Ruby?

When reading a file per line in Ruby, data is taken from the file 512 bytes at a time and split up in "lines" after that. This concludes our overview of I/O and streaming files in Ruby.

What is I/O and streaming files in Ruby?

To summarize, streaming files works by asking the operating system's kernel to open a file, then read bytes from it bit by bit. When reading a file per line in Ruby, data is taken from the file 512 bytes at a time and split up in "lines" after that. This concludes our overview of I/O and streaming files in Ruby.

How do you do triple equals in Ruby with grep?

The trick is with the === method (triple equals) in Ruby. Grep calls this method on whatever argument you pass to it. And it turns out that classes, regular expressions & ranges all implement ===.


3 Answers

It looks like your best bet is to use STDOUT.syswrite and STDOUT.sysread - the following seemed to have reasonably good performance, despite being ugly code:

STDIN.sync = true
STDOUT.syswrite "Looking for #{ARGV[0]}\n"

def next_line
  mybuff = @overflow || ""
  until mybuff[/\n/]
    mybuff += STDIN.sysread(8)
  end
  overflow = mybuff.split("\n")
  out, *others = overflow
  @overflow = others.join("\n")
  out
rescue EOFError => e
  false  # NB: There's a bug here, see below
end

line = next_line
while line
  STDOUT.syswrite "#{line}\n" if line =~ /#{ARGV[0]}/i
  line = next_line
end

Note: Not sure you need #sync with #sysread, but if so you should probably sync STDOUT too. Also, it reads 8 bytes at a time into mybuff - you should experiment with this value, it's highly inefficient / CPU heavy. Lastly, this code is hacky and needs a refactor, but it works - tested it using ls -l ~/* | ruby rgrep.rb doc (where 'doc' is the search term)


Second note: Apparently, I was so busy trying to get it to perform well, I failed to get it to perform correctly! As Dmitry Shevkoplyas has noted, if there is text in @overflow when EOFError is raised, that text will be lost. I believe if you replace the catch with the following, it should fix the problem:

rescue EOFError => e
  return false unless @overflow && @overflow.length > 0
  output = @overflow
  @overflow = ""
  output
end

(if you found that helpful, please upvote Dmitry's answer!)

like image 190
user208769 Avatar answered Nov 02 '22 22:11

user208769


You can always turn on autoflush on any stream you want:

STDOUT.sync = true

This will have the effect of committing any writes immediately.

Most languages have this feature, but they always call it something a little different.

like image 40
tadman Avatar answered Nov 02 '22 22:11

tadman


You can call $stdout.flush after you've printed your line, and call $stdin.readline to fetch one line.

like image 20
Ehren Murdick Avatar answered Nov 02 '22 23:11

Ehren Murdick