Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I can't remove whitespaces from a string parsed by Nokogiri

I can't remove whitespaces from a string.

My HTML is:

<p class='your-price'>
Cena pro Vás: <strong>139&nbsp;<small>Kč</small></strong>
</p>

My code is:

#encoding: utf-8
require 'rubygems'
require 'mechanize'

agent = Mechanize.new
site  = agent.get("http://www.astratex.cz/podlozky-pod-raminka/doplnky")
price = site.search("//p[@class='your-price']/strong/text()")

val = price.first.text  => "139 "
val.strip               => "139 "
val.gsub(" ", "")       => "139 "

gsub, strip, etc. don't work. Why, and how do I fix this?

val.class      => String
val.dump       => "\"139\\u{a0}\""      !
val.encoding   => #<Encoding:UTF-8>

__ENCODING__               => #<Encoding:UTF-8>
Encoding.default_external  => #<Encoding:UTF-8>

I'm using Ruby 1.9.3 so Unicode shouldn't be problem.

like image 968
A.D. Avatar asked Jan 02 '13 18:01

A.D.


People also ask

Which function is used for removing Whitespaces from beginning?

The trim() function removes whitespace and other predefined characters from both sides of a string. Related functions: ltrim() - Removes whitespace or other predefined characters from the left side of a string.

Which method can be used to remove any whitespace from both ends of a string?

trim() The trim() method removes whitespace from both ends of a string and returns a new string, without modifying the original string. Whitespace in this context is all the whitespace characters (space, tab, no-break space, etc.)

What is nokogiri in rails?

Nokogiri is an open source software library to parse HTML and XML in Ruby. It depends on libxml2 and libxslt to provide its functionality.


1 Answers

strip only removes ASCII whitespace and the character you've got here is a Unicode non-breaking space.

Removing the character is easy. You can use gsub by providing a regex with the character code:

gsub(/\u00a0/, '')

You could also call

gsub(/[[:space:]]/, '')

to remove all Unicode whitespace. For details, check the Regexp documentation.

like image 72
toniedzwiedz Avatar answered Sep 28 '22 09:09

toniedzwiedz