I've got some data stored in a file where each block of interest is stored in a paragraph like so:
hello
there
kind
people
of
stack
overflow
I have tried reading each paragraph with the following code, but it does not work:
paragraphs = File.open("hundreds_of_gigs").lazy.to_enum.grep(/.*\n\n/) do |p|
puts p
end
With the regex I am trying to say: "match anything that ends with two newlines"
What am I doing wrong?
Any lazy way of solving this appreciated. The terser the method, the better.
IO#readline("\n\n") will do what you want. File
is a subclass of IO
and has all it's methods even though they are not stated on the File rubydoc page.
It reads line by line, where a line end is the given seperator.
E.g.:
f = File.open("your_file")
f.readline("\n\n") => "hello\nthere\n\n"
f.readline("\n\n") => "kind\n\n"
f.readline("\n\n") => "people\nof\n\n"
f.readline("\n\n") => "stack\noverflow\n\n"
Each call to readline lazy reads one line of the file starting from top.
Or you can use IO#each_line("\n\n") to iterate over the file.
E.g.:
File.open("your_file").each_line("\n\n") do |line|
puts line
end
=> "hello\nthere\n\n"
=> "kind\n\n"
=> "people\nof\n\n"
=> "stack\noverflow\n\n"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With