Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split an array of strings into an array of arrays of strings

I am looking for a way to split this array of strings:

["this", "is", "a", "test", ".", "I", "wonder", "if", "I", "can", "parse", "this",
"text", "?", "Without", "any", "errors", "!"]

into groups terminated by a punctuation:

[
  ["this", "is", "a", "test", "."],
  ["I", "wonder", "if", "I", "can", "parse", "this", "text", "?"],
  ["Without", "any", "errors", "!"]
]

Is there a simple method to do this? Is the most sane approach to iterate the array, adding each index to a temporary array, and append that temporary array to the container array when punctuation is found?

I was thinking of using slice or map, but I can't figure out if it is possible or not.

like image 850
randy newfield Avatar asked Dec 10 '22 19:12

randy newfield


2 Answers

Check out Enumerable#slice_after:

x.slice_after { |e| '.?!'.include?(e) }.to_a
like image 110
ndnenkov Avatar answered Feb 20 '23 13:02

ndnenkov


@ndn has given the best answer to this question, but I will suggest another approach that may have application to other problems.

Arrays such as the one you have given are generally obtained by splitting strings on whitespace or punctuation. For example:

s = "this is a test. I wonder if I can parse this text? Without any errors!"
s.scan /\w+|[.?!]/
  #=> ["this", "is", "a", "test", ".", "I", "wonder", "if", "I", "can",
  #    "parse", "this", "text", "?", "Without", "any", "errors", "!"] 

When this is the case you may find it more convenient to manipulate the string directly in some other way. Here, for example, you could first use String#split with a regex to break the string s into sentences:

r1 = /
     (?<=[.?!]) # match one of the given punctuation characters in capture group 1
     \s*   # match >= 0 whitespace characters to remove spaces
     /x    # extended/free-spacing regex definition mode

a = s.split(r1)
  #=> ["this is a test.", "I wonder if I can parse this text?",
  #    "Without any errors!"] 

and then split up the sentences:

r2 = /
     \s+       # match >= 1 whitespace characters
     |         # or
     (?=[.?!]) # use a positive lookahead to match a zero-width string
               # followed by one of the punctuation characters
     /x

b = a.map { |s| s.split(r2) }
  #=> [["this", "is", "a", "test", "."],
  #    ["I", "wonder", "if", "I", "can", "parse", "this", "text", "?"],
  #    ["Without", "any", "errors", "!"]]
like image 28
Cary Swoveland Avatar answered Feb 20 '23 12:02

Cary Swoveland