Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split string into a list, but keeping the split pattern

Tags:

string

split

ruby

Currently i am splitting a string by pattern, like this:

outcome_array=the_text.split(pattern_to_split_by)

The problem is that the pattern itself that i split by, always gets omitted.

How do i get it to include the split pattern itself?

like image 938
john-jones Avatar asked Aug 05 '11 14:08

john-jones


People also ask

How do you split a string but keep the delimiters?

Summary: To split a string and keep the delimiters/separators you can use one of the following methods: Use a regex module and the split() method along with \W special character. Use a regex module and the split() method along with a negative character set [^a-zA-Z0-9] .

How do you split a string and add it to a list?

The split() method splits a string into a list. You can specify the separator, default separator is any whitespace. Note: When maxsplit is specified, the list will contain the specified number of elements plus one.

Can you use the split function with a list?

A split function is composed of a specified separator and max parameter. A split function can be used to split strings with the help of a delimiter. A split function can be used to split strings with the help of the occurrence of a character. A split function can be used to split strings in the form of a list.


1 Answers

Thanks to Mark Wilkins for inpsiration, but here's a shorter bit of code for doing it:

irb(main):015:0> s = "split on the word on okay?"
=> "split on the word on okay?"
irb(main):016:0> b=[]; s.split(/(on)/).each_slice(2) { |s| b << s.join }; b
=> ["split on", " the word on", " okay?"]

or:

s.split(/(on)/).each_slice(2).map(&:join)

See below the fold for an explanation.


Here's how this works. First, we split on "on", but wrap it in parentheses to make it into a match group. When there's a match group in the regular expression passed to split, Ruby will include that group in the output:

s.split(/(on)/)
# => ["split", "on", "the word", "on", "okay?"

Now we want to join each instance of "on" with the preceding string. each_slice(2) helps by passing two elements at a time to its block. Let's just invoke each_slice(2) to see what results. Since each_slice, when invoked without a block, will return an enumerator, we'll apply to_a to the Enumerator so we can see what the Enumerator will enumerator over:

s.split(/(on)/).each_slice(2).to_a
# => [["split", "on"], ["the word", "on"], ["okay?"]]

We're getting close. Now all we have to do is join the words together. And that gets us to the full solution above. I'll unwrap it into individual lines to make it easier to follow:

b = []
s.split(/(on)/).each_slice(2) do |s|
  b << s.join
end
b
# => ["split on", "the word on" "okay?"]

But there's a nifty way to eliminate the temporary b and shorten the code considerably:

s.split(/(on)/).each_slice(2).map do |a|
  a.join
end

map passes each element of its input array to the block; the result of the block becomes the new element at that position in the output array. In MRI >= 1.8.7, you can shorten it even more, to the equivalent:

s.split(/(on)/).each_slice(2).map(&:join)
like image 115
David Grayson Avatar answered Oct 16 '22 10:10

David Grayson