I can't remove whitespaces from a string parsed by Nokogiri

Tags:

I can't remove whitespaces from a string.

My HTML is:

<p class='your-price'>
Cena pro Vás: <strong>139&nbsp;<small>Kč</small></strong>
</p>

My code is:

#encoding: utf-8
require 'rubygems'
require 'mechanize'

agent = Mechanize.new
site  = agent.get("http://www.astratex.cz/podlozky-pod-raminka/doplnky")
price = site.search("//p[@class='your-price']/strong/text()")

val = price.first.text  => "139 "
val.strip               => "139 "
val.gsub(" ", "")       => "139 "

gsub, strip, etc. don't work. Why, and how do I fix this?

val.class      => String
val.dump       => "\"139\\u{a0}\""      !
val.encoding   => #<Encoding:UTF-8>

__ENCODING__               => #<Encoding:UTF-8>
Encoding.default_external  => #<Encoding:UTF-8>

I'm using Ruby 1.9.3 so Unicode shouldn't be problem.

968

asked Jan 02 '13 18:01

A.D.

1 Answers

strip only removes ASCII whitespace and the character you've got here is a Unicode non-breaking space.

Removing the character is easy. You can use gsub by providing a regex with the character code:

gsub(/\u00a0/, '')

You could also call

gsub(/[[:space:]]/, '')

to remove all Unicode whitespace. For details, check the Regexp documentation.

answered Sep 28 '22 09:09

toniedzwiedz

Related questions
                            
                                Can you have required keyword arguments in Javascript or Python?
                            
                                What is the object in Ruby's "hello world"?
                            
                                How to add script tag in jekyll?
                            
                                Converting Ruby to C#
                            
                                Sub-classing Fixnum in ruby
                            
                                Persistent hashtable for Ruby programs?
                            
                                Script to run against stdin if no arg; otherwise input file =ARGV[0]
                            
                                Does OmniAuth provide simple hooks to the Facebook Graph API?
                            
                                Bypass validations during a data only migration to fix validation errors
                            
                                Does using curly braces go against the "Ruby way"?
                            
                                Rails 3 - 'Couldn't parse Yaml'
                            
                                Ruby on Rails - Iterate numbers by decimal for a review rating field
                            
                                Undefined Method 'path' For StringIO in Ruby
                            
                                form_for with multiple controller actions for submit
                            
                                Delete Instance Variables from Objects in an Array
                            
                                Error installing mysql2 gem on Debian Squeeze
                            
                                How to insert a new element in between all elements of a Ruby array?
                            
                                ruby-pg sanitize data before insert
                            
                                how to do the XOR operation between two hexa strings?
                            
                                How can I remove Google tracking parameters (UTM) from an URL?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

I can't remove whitespaces from a string parsed by Nokogiri

Tags:

ruby

whitespace

nokogiri

mechanize

mechanize-ruby

A.D.

People also ask

1 Answers

toniedzwiedz

Recent Activity

Donate For Us