Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use ruby gsub Regexp with many matches?

Tags:

I have csv file contents having double quotes inside quoted text

test,first,line,"you are a "kind" man",thanks again,second,li,"my "boss" is you",good 

I need to replace every double quote not preceded or succeeded by a comma by ""

test,first,line,"you are a ""kind"" man",thanks again,second,li,"my ""boss"" is you",good 

so " is replaced by ""

I tried

x.gsub(/([^,])"([^,])/, "#{$1}\"\"#{$2}") 

but didn't work

like image 893
Mahmoud Khaled Avatar asked Feb 01 '12 15:02

Mahmoud Khaled


People also ask

Does GSUB use regex?

Regular expressions (shortened to regex) are used to operate on patterns found in strings. They can find, replace, or remove certain parts of strings depending on what you tell them to do.

What method should you use when you want to get all sequences matching a regex pattern in a string?

To find all the matching strings, use String's scan method.

How does GSUB work in Ruby?

gsub! is a String class method in Ruby which is used to return a copy of the given string with all occurrences of pattern substituted for the second argument. If no substitutions were performed, then it will return nil. If no block and no replacement is given, an enumerator is returned instead.

What is GSUB in regex?

gsub() function replaces all matches of a string, if the parameter is a string vector, returns a string vector of the same length and with the same attributes (after possible coercion to character). Elements of string vectors which are not substituted will be returned unchanged (including any declared encoding).


1 Answers

Your regex needs to be a little more bold, in case the quotes occur at the start of the first value, or at the end of the last value:

csv = <<ENDCSV test,first,line,"you are a "kind" man",thanks again,second,li,"my "boss" is you",good more,""Someone" said that you're "cute"",yay "watch out for this",and,also,"this test case" ENDCSV  puts csv.gsub(/(?<!^|,)"(?!,|$)/,'""') #=> test,first,line,"you are a ""kind"" man",thanks #=> again,second,li,"my ""boss"" is you",good #=> more,"""Someone"" said that you're ""cute""",yay #=> "watch out for this",and,also,"this test case" 

The above regex is using negative lookbehind and negative lookahead assertions (anchors) available in Ruby 1.9.

  • (?<!^|,) — immediately preceding this spot there must not be either a start of line (^) or a comma
  • " — find a double quote
  • (?!,|$) — immediately following this spot there must not be either a comma or end of line ($)

As a bonus, since you didn't actually capture the characters on either side, you don't need to worry about using \1 correctly in your replacement string.

For more information, see the section "Anchors" in the official Ruby regex documentation.


However, for the case where you do need to replace matches in your output, you can use any of the following:

"hello".gsub /([aeiou])/, '<\1>'            #=> "h<e>ll<o>" "hello".gsub /([aeiou])/, "<\\1>"           #=> "h<e>ll<o>" "hello".gsub(/([aeiou])/){ |m| "<#{$1}>" }  #=> "h<e>ll<o>" 

You can't use String interpolation in the replacement string, as you did:

"hello".gsub /([aeiou])/, "<#{$1}>"  #=> "h<previousmatch>ll<previousmatch>" 

…because that string interpolation happens once, before the gsub has been run. Using the block form of gsub re-invokes the block for each match, at which point the global $1 has been appropriately populated and is available for use.


Edit: For Ruby 1.8 (why on earth are you using that?) you can use:

puts csv.gsub(/([^,\n\r])"([^,\n\r])/,'\1""\2') 
like image 108
Phrogz Avatar answered Sep 19 '22 20:09

Phrogz