I finally managed to parse parts of a website:
get '/' do
url = '<website>'
data = Nokogiri::HTML(open(url))
@rows = data.css("td[valign=top] table tr")
erb :muster
end
Now I am trying to extract a certain line in my view. Therefore i put in my HTML code:
<%= @rows[2] %>
And it actually returns the code, but it has problems with UTF8:
<td class="class_name"> </td>
instead it says
<td class="class_name">�</td>
How do I ensure UTF8 during nokogiri parsing, erb, and HTML generation?
See: http://www.nokogiri.org/tutorials/parsing_an_html_xml_document.html#encoding
It looks like in your case, the document is declaring that it's encoded using iso8859:
<meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1">
You can do the following to force Nokogiri to treat the stream as UTF-8:
data = Nokogiri::HTML(open(url), nil, Encoding::UTF_8.to_s)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With