I'm trying to get a page with an ISO-8859-1 encoding clicking on a link, so the code is similar to this:
page_result = page.link_with( :text => 'link_text' ).click
So far I get the result with a wrong encoding, so I see characters like:
'T�tulo:' instead of 'Título:'
I've tried several approaches, including:
Stating the encoding in the first request using the agent like:
@page_search = @agent.get(
:url => 'http://www.server.com',
:headers => { 'Accept-Charset' => 'ISO-8859-1' } )
Stating the encoding for the page itself
page_result.encoding = 'ISO-8859-1'
But I must be doing something wrong: a simple puts always show the wrong characters.
Do you know how to state the encoding?
Thanks in advance,
Added: Executable example:
require 'rubygems'
require 'mechanize'
WWW::Mechanize::Util::CODE_DIC[:SJIS] = "ISO-8859-1"
@agent = WWW::Mechanize.new
@page = @agent.get(
:url => 'http://www.mcu.es/webISBN/tituloSimpleFilter.do?cache=init&layout=busquedaisbn&language=es',
:headers => { 'Accept-Charset' => 'utf-8' } )
puts @page.body
Hey you can just do a:
agent.page.encoding = 'utf-8'
Hope it helps!
The previous answer is correct, but in my code it looks slightly different:
agent = Mechanize.new
page = agent.get('http://example.com')
page.encoding = 'windows-1251'
page.search('p').each do |para|
puts para.text
end
Sorry, it was my mistake: I come from a Java background and there strings are internally converted to utf-16. I forgot Ruby doesn't do it. Mechanize was recovering the page flawlessly, but I needed to convert the data via iconv.
Mental note: Ruby stores the strings without converting its encoding.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With