By default Ruby opens $stdin
and $stdout
in buffered mode. This means you can't use Ruby to perform a grep-like operation filtering text. Is there any way to force Ruby to use line-oriented mode? I've seen various solutions including popen3
(which does buffered-mode only) and pty
(which doesn't separately handle $stdout
and $stderr
, which I require).
How do I do this? Python seems to have the same lack.
To summarize, streaming files works by asking the operating system's kernel to open a file, then read bytes from it bit by bit. When reading a file per line in Ruby, data is taken from the file 512 bytes at a time and split up in "lines" after that.
When reading a file per line in Ruby, data is taken from the file 512 bytes at a time and split up in "lines" after that. This concludes our overview of I/O and streaming files in Ruby.
To summarize, streaming files works by asking the operating system's kernel to open a file, then read bytes from it bit by bit. When reading a file per line in Ruby, data is taken from the file 512 bytes at a time and split up in "lines" after that. This concludes our overview of I/O and streaming files in Ruby.
The trick is with the === method (triple equals) in Ruby. Grep calls this method on whatever argument you pass to it. And it turns out that classes, regular expressions & ranges all implement ===.
It looks like your best bet is to use STDOUT.syswrite and STDOUT.sysread - the following seemed to have reasonably good performance, despite being ugly code:
STDIN.sync = true
STDOUT.syswrite "Looking for #{ARGV[0]}\n"
def next_line
mybuff = @overflow || ""
until mybuff[/\n/]
mybuff += STDIN.sysread(8)
end
overflow = mybuff.split("\n")
out, *others = overflow
@overflow = others.join("\n")
out
rescue EOFError => e
false # NB: There's a bug here, see below
end
line = next_line
while line
STDOUT.syswrite "#{line}\n" if line =~ /#{ARGV[0]}/i
line = next_line
end
Note: Not sure you need #sync with #sysread, but if so you should probably sync STDOUT too. Also, it reads 8 bytes at a time into mybuff - you should experiment with this value, it's highly inefficient / CPU heavy. Lastly, this code is hacky and needs a refactor, but it works - tested it using ls -l ~/* | ruby rgrep.rb doc
(where 'doc' is the search term)
Second note: Apparently, I was so busy trying to get it to perform well, I failed to get it to perform correctly! As Dmitry Shevkoplyas has noted, if there is text in @overflow when EOFError is raised, that text will be lost. I believe if you replace the catch with the following, it should fix the problem:
rescue EOFError => e
return false unless @overflow && @overflow.length > 0
output = @overflow
@overflow = ""
output
end
(if you found that helpful, please upvote Dmitry's answer!)
You can always turn on autoflush on any stream you want:
STDOUT.sync = true
This will have the effect of committing any writes immediately.
Most languages have this feature, but they always call it something a little different.
You can call $stdout.flush
after you've printed your line, and call $stdin.readline
to fetch one line.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With