I am looking for a way to split this array of strings:
["this", "is", "a", "test", ".", "I", "wonder", "if", "I", "can", "parse", "this",
"text", "?", "Without", "any", "errors", "!"]
into groups terminated by a punctuation:
[
["this", "is", "a", "test", "."],
["I", "wonder", "if", "I", "can", "parse", "this", "text", "?"],
["Without", "any", "errors", "!"]
]
Is there a simple method to do this? Is the most sane approach to iterate the array, adding each index to a temporary array, and append that temporary array to the container array when punctuation is found?
I was thinking of using slice
or map
, but I can't figure out if it is possible or not.
Check out Enumerable#slice_after
:
x.slice_after { |e| '.?!'.include?(e) }.to_a
@ndn has given the best answer to this question, but I will suggest another approach that may have application to other problems.
Arrays such as the one you have given are generally obtained by splitting strings on whitespace or punctuation. For example:
s = "this is a test. I wonder if I can parse this text? Without any errors!"
s.scan /\w+|[.?!]/
#=> ["this", "is", "a", "test", ".", "I", "wonder", "if", "I", "can",
# "parse", "this", "text", "?", "Without", "any", "errors", "!"]
When this is the case you may find it more convenient to manipulate the string directly in some other way. Here, for example, you could first use String#split with a regex to break the string s
into sentences:
r1 = /
(?<=[.?!]) # match one of the given punctuation characters in capture group 1
\s* # match >= 0 whitespace characters to remove spaces
/x # extended/free-spacing regex definition mode
a = s.split(r1)
#=> ["this is a test.", "I wonder if I can parse this text?",
# "Without any errors!"]
and then split up the sentences:
r2 = /
\s+ # match >= 1 whitespace characters
| # or
(?=[.?!]) # use a positive lookahead to match a zero-width string
# followed by one of the punctuation characters
/x
b = a.map { |s| s.split(r2) }
#=> [["this", "is", "a", "test", "."],
# ["I", "wonder", "if", "I", "can", "parse", "this", "text", "?"],
# ["Without", "any", "errors", "!"]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With