I am using nokogiri to screen scrape some HTML. In some occurrences, I am getting some weird characters back, I have tracked down the ASCII code for these characters with the following code:
@parser.leads[0].phone_numbers[0].each_byte do |c|
puts "char=#{c}"
end
The characters in question have an ASCII code of 194 and 160.
I want to somehow strip these characters out while parsing.
I have tried the following code but it does not work.
@parser.leads[0].phone_numbers[0].gsub(/160.chr/,'').gsub(/194.chr/,'')
Can anyone tell me how to achieve this?
In Ruby, we can permanently delete characters from a string by using the string. delete method. It returns a new string with the specified characters removed.
gsub! is a String class method in Ruby which is used to return a copy of the given string with all occurrences of pattern substituted for the second argument. If no substitutions were performed, then it will return nil. If no block and no replacement is given, an enumerator is returned instead.
gsub (s, pattern, repl [, n]) Returns a copy of s in which all (or the first n , if given) occurrences of the pattern have been replaced by a replacement string specified by repl , which can be a string, a table, or a function. gsub also returns, as its second value, the total number of matches that occurred.
I found this question while trying to strip out invisible characters when "trimming" a string.
s.strip
did not work for me and I found that the invisible character had the ord
number 194
None of the methods above worked for me but then I found "Convert non-breaking spaces to spaces in Ruby " question which says:
Use
/\u00a0/
to match non-breaking spaces:s.gsub(/\u00a0/, ' ')
converts all non-breaking spaces to regular spacesUse
/[[:space:]]/
to match all whitespace, including Unicode whitespace like non-breaking spaces. This is unlike/\s/
, which matches only ASCII whitespace.
So glad I found that! Now I'm using:
s.gsub(/[[:space:]]/,'')
This doesn't answer the question of how to gsub
specific character codes, but if you're just trying to remove whitespace it seems to work pretty well.
Your problem is that you want to do a method call but instead you're creating a Regexp. You're searching and replacing strings consisting of the string "160" followed by any character and then the string "chr", and then doing the same except with "160" replaced with "194".
Instead, do gsub(160.chr, '')
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With