Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

gsub ASCII code characters from a string in ruby

Tags:

ruby

I am using nokogiri to screen scrape some HTML. In some occurrences, I am getting some weird characters back, I have tracked down the ASCII code for these characters with the following code:

  @parser.leads[0].phone_numbers[0].each_byte  do |c|
    puts "char=#{c}"
  end

The characters in question have an ASCII code of 194 and 160.

I want to somehow strip these characters out while parsing.

I have tried the following code but it does not work.

@parser.leads[0].phone_numbers[0].gsub(/160.chr/,'').gsub(/194.chr/,'')

Can anyone tell me how to achieve this?

like image 222
dagda1 Avatar asked Aug 13 '10 04:08

dagda1


People also ask

How do I remove special characters from a string in Ruby?

In Ruby, we can permanently delete characters from a string by using the string. delete method. It returns a new string with the specified characters removed.

What does GSUB do in Ruby?

gsub! is a String class method in Ruby which is used to return a copy of the given string with all occurrences of pattern substituted for the second argument. If no substitutions were performed, then it will return nil. If no block and no replacement is given, an enumerator is returned instead.

What does GSUB return?

gsub (s, pattern, repl [, n]) Returns a copy of s in which all (or the first n , if given) occurrences of the pattern have been replaced by a replacement string specified by repl , which can be a string, a table, or a function. gsub also returns, as its second value, the total number of matches that occurred.


2 Answers

I found this question while trying to strip out invisible characters when "trimming" a string.

s.strip did not work for me and I found that the invisible character had the ord number 194

None of the methods above worked for me but then I found "Convert non-breaking spaces to spaces in Ruby " question which says:

Use /\u00a0/ to match non-breaking spaces: s.gsub(/\u00a0/, ' ') converts all non-breaking spaces to regular spaces

Use /[[:space:]]/ to match all whitespace, including Unicode whitespace like non-breaking spaces. This is unlike /\s/, which matches only ASCII whitespace.

So glad I found that! Now I'm using:

s.gsub(/[[:space:]]/,'')

This doesn't answer the question of how to gsub specific character codes, but if you're just trying to remove whitespace it seems to work pretty well.

like image 69
cwd Avatar answered Oct 02 '22 11:10

cwd


Your problem is that you want to do a method call but instead you're creating a Regexp. You're searching and replacing strings consisting of the string "160" followed by any character and then the string "chr", and then doing the same except with "160" replaced with "194".

Instead, do gsub(160.chr, '').

like image 22
Chuck Avatar answered Oct 02 '22 12:10

Chuck