I have a bit trouble with Mechanize.
When a submit a form with Mechanize. I am come to a page with one meta refresh and there is no links.
My question is how do i follow the meta refresh?
I have tried to allow meta refresh but then i get a socket error. Sample code
require 'mechanize'
agent = WWW::Mechanize.new
agent.get("http://euroads.dk")
form = agent.page.forms.first
form.username = "username"
form.password = "password"
form.submit
page = agent.get("http://www.euroads.dk/system/index.php?showpage=login")
agent.page.body
The response:
<html>
<head>
<META HTTP-EQUIV=\"Refresh\" CONTENT=\"0;URL=index.php?showpage=m_frontpage\">
</head>
</html>
Then I try:
redirect_url = page.parser.at('META[HTTP-EQUIV=\"Refresh\"]')[
"0;URL=index.php?showpage=m_frontpage\"][/url=(.+)/, 1]
But I get:
NoMethodError: Undefined method '[]' for nil:NilClass
Internally, Mechanize uses Nokogiri to handle parsing of the HTML into a DOM. You can get at the Nokogiri document so you can use either XPath or CSS accessors to dig around in a returned page.
This is how to get the redirect URL with Nokogiri only:
require 'nokogiri'
html = <<EOT
<html>
<head>
<meta http-equiv="refresh" content="2;url=http://www.example.com/">
</meta>
</head>
<body>
foo
</body>
</html>
EOT
doc = Nokogiri::HTML(html)
redirect_url = doc.at('meta[http-equiv="refresh"]')['content'][/url=(.+)/, 1]
redirect_url # => "http://www.example.com/"
doc.at('meta[http-equiv="refresh"]')['content'][/url=(.+)/, 1]
breaks down to: Find the first occurrence (at
) of the CSS accessor for the <meta>
tag with an http-equiv
attribute of refresh
. Take the content
attribute of that tag and return the string following url=
.
This is some Mechanize code for a typical use. Because you gave no sample code to base mine on you'll have to work from this:
agent = Mechanize.new
page = agent.get('http://www.examples.com/')
redirect_url = page.parser.at('meta[http-equiv="refresh"]')['content'][/url=(.+)/, 1]
page = agent.get(redirect_url)
EDIT: at('META[HTTP-EQUIV=\"Refresh\"]')
Your code has the above at()
. Notice that you are escaping the double-quotes inside a single-quoted string. That results in a backslash followed by a double-quote in the string which is NOT what my sample uses, and is my first guess for why you're getting the error you are. Nokogiri can't find the tag because there is no <meta http-equiv=\"Refresh\"...>
.
EDIT: Mechanize has a built-in way to handle meta-refresh, by setting:
agent.follow_meta_refresh = true
It also has a method to parse the meta tag and return the content. From the docs:
parse(content, uri)
Parses the delay and url from the content attribute of a meta tag. Parse requires the uri of the current page to infer a url when no url is specified. If a block is given, the parsed delay and url will be passed to it for further processing. Returns nil if the delay and url cannot be parsed.
# <meta http-equiv="refresh" content="5;url=http://example.com/" />
uri = URI.parse('http://current.com/')
Meta.parse("5;url=http://example.com/", uri) # => ['5', 'http://example.com/']
Meta.parse("5;url=", uri) # => ['5', 'http://current.com/']
Meta.parse("5", uri) # => ['5', 'http://current.com/']
Meta.parse("invalid content", uri) # => nil
Mechanize treats meta refresh elements just like links without text. Thus, your code can be as simple as this:
page = agent.get("http://www.euroads.dk/system/index.php?showpage=login")
page.meta_refresh.first.click
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With