I am trying to extract the content of a webpage A. Using groovy I've tried the following
......
String urlStr = "url-of-webpage-A"
String pageText = urlStr.toURL().text
//println pageText
.....
The above code retrieves the text of webPage A as long as it doesn't redirect to an other webpage B. If A redirects to B, the page content of webPage B is retrieved in the pageText variable. Is there a way to code and check if webPage A is redirecting to an other webpage (in groovy or java)?
PS: The above piece of code is not a part of server side logic. I am executing it on the client side within the scope of a desktop appilcation.
In Java you can use URL.openConnection()
to get a HttpURLConnection
(you'll need to cast). On this you can call setInstanceFollowRedirects(false)
.
Then you can use getResponseCode()
and see if HTTP_MOVED_PERM
(301), HTTP_MOVED_TEMP
(302) or HTTP_SEE_OTHER
(303). They all indicate redirection.
If you need to know where you're being redirected to, then you can use getHeaderField("Location")
to get the location header.
In groovy, you could do what Joachim suggests by doing:
String location = "url-of-webpage-A"
boolean wasRedirected = false
String pageContent = null
while( location ) {
new URL( location ).openConnection().with { con ->
// We'll do redirects ourselves
con.instanceFollowRedirects = false
// Get the response code, and the location to jump to (in case of a redirect)
location = con.getHeaderField( "Location" )
if( !wasRedirected && location ) {
wasRedirected = true
}
// Read the HTML and close the inputstream
pageContent = con.inputStream.withReader { it.text }
}
}
println "wasRedirected:$wasRedirected contentLength:${pageContent.length()}"
If you don't want to be redirected, and want the contents of the first page, you simply need to do:
String location = "url-of-webpage-A"
String pageContent = new URL( location ).openConnection().with { con ->
// We'll do redirects ourselves
con.instanceFollowRedirects = false
// Get the location to jump to (in case of a redirect)
location = con.getHeaderField( "Location" )
// Read the HTML and close the inputstream
con.inputStream.withReader { it.text }
}
if( location ) {
println "Page wanted to redirect to $location"
}
println "Content was:"
println pageContent
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With