I would like to patch some text data extracted from web pages. sample: <pre class="prettyprint"><code>t="First sentence. Second sentence.Third sentence." </code></pre> There is no space after the point at the end of the second sentence. This sign me that the 3rd sentence was in a separate line (after a br tag) in the original document. I want to use this regexp to insert "\n" character into the proper places and patch my text. My regex: <pre class="prettyprint"><code>t2=t.gsub(/([.\!?])([A-Z1-9])/,$1+"\n"+$2) </code></pre> But unfortunately it doesn't work: "NoMethodError: undefined method `+' for nil:NilClass" How can I properly backreference to the matched groups? It was so easy in Microsoft Word, I just had to use \1 and \2 symbols.

<ul> <li>If you are using <code>gsub(regex, replacement)</code>, then use <code>'\1'</code>, <code>'\2'</code>, ... to refer to the match. Make sure not to put double quotes around the <code>replacement</code>, or else escape the backslash as in Joshua's answer. The conversion from <code>'\1'</code> to the match will be done within <code>gsub</code>, not by literal interpretation.</li> <li>If you are using <code>gsub(regex){replacement}</code>, then use <code>$1</code>, <code>$1</code>, ...</li> </ul> But for your case, it is easier not to use matches: <pre class="prettyprint"><code>t2 = t.gsub(/(?<=[.\!?])(?=[A-Z1-9])/, "\n") </code></pre>

How to backreference in Ruby regular expression (regex) with gsub when I use grouping?

Tags:

regex

reference

ruby

gsub

backreference

I would like to patch some text data extracted from web pages. sample:

t="First sentence. Second sentence.Third sentence."

There is no space after the point at the end of the second sentence. This sign me that the 3rd sentence was in a separate line (after a br tag) in the original document.

I want to use this regexp to insert "\n" character into the proper places and patch my text. My regex:

t2=t.gsub(/([.\!?])([A-Z1-9])/,$1+"\n"+$2)

But unfortunately it doesn't work: "NoMethodError: undefined method `+' for nil:NilClass" How can I properly backreference to the matched groups? It was so easy in Microsoft Word, I just had to use \1 and \2 symbols.

453

asked Aug 22 '12 02:08

Konstantin

3 Answers

You can backreference in the substitution string with \1 (to match capture group 1).

t = "First sentence. Second sentence.Third sentence!Fourth sentence?Fifth sentence."
t.gsub(/([.!?])([A-Z1-9])/, "\\1\n\\2") # => "First sentence. Second sentence.\nThird sentence!\nFourth sentence?\nFifth sentence."

answered Oct 14 '22 00:10

Joshua Cheek

If you are using gsub(regex, replacement), then use '\1', '\2', ... to refer to the match. Make sure not to put double quotes around the replacement, or else escape the backslash as in Joshua's answer. The conversion from '\1' to the match will be done within gsub, not by literal interpretation.
If you are using gsub(regex){replacement}, then use $1, $1, ...

But for your case, it is easier not to use matches:

t2 = t.gsub(/(?<=[.\!?])(?=[A-Z1-9])/, "\n")

answered Oct 14 '22 00:10

sawa

If you got here because of Rubocop complaining "Avoid the use of Perl-style backrefs." about $1, $2, etc... you can can do this instead:

some_id = $1
# or
some_id = Regexp.last_match[1] if Regexp.last_match

some_id = $5
# or
some_id = Regexp.last_match[5] if Regexp.last_match

It'll also want you to do

%r{//}.match(some_string)

instead of

some_string[//]

Lame (Rubocop)

answered Oct 14 '22 00:10

Ben Wiseley

Related questions
                            
                                Query on Mongoid Hash Field
                            
                                Rails change submit button text
                            
                                don't have jekyll-paginate or one of its dependencies installed
                            
                                Ruby dependency injection libraries
                            
                                Hash inside YAML file?
                            
                                In Ruby's Test::Unit::TestCase, how do I override the initialize method?
                            
                                How do I use Mechanize to process JavaScript?
                            
                                An error occurred while installing curb (0.8.5)
                            
                                How to deal with the sum of rounded percentage not being 100?
                            
                                Create a daemon with double-fork in Ruby
                            
                                Array TypeError: can't convert Fixnum into String
                            
                                Ruby 2.0.0 String#Match ArgumentError: invalid byte sequence in UTF-8
                            
                                Best Ruby on Rails social networking framework [closed]
                            
                                Are there any iPython-like shells for Ruby or Rails?
                            
                                What is the "sys.stdout.write()" equivalent in Ruby?
                            
                                Check if two timestamps are the same day in Ruby
                            
                                See if a ruby string has whitespace in it
                            
                                Canonical File Path in Ruby
                            
                                httparty: how to log request?
                            
                                Format Ruby code in Vim

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to backreference in Ruby regular expression (regex) with gsub when I use grouping?

Tags:

regex

reference

ruby

gsub

backreference

Konstantin

People also ask

3 Answers

Joshua Cheek

sawa

Ben Wiseley

Recent Activity

Donate For Us