Do I gain something when I transform my $url
like this: $url = URI->new( $url )
?
#!/usr/bin/env perl
use warnings; use strict;
use 5.012;
use URI;
use XML::LibXML;
my $url = 'http://stackoverflow.com/';
$url = URI->new( $url );
my $doc = XML::LibXML->load_html( location => $url, recover => 2 );
my @nodes = $doc->getElementsByTagName( 'a' );
say scalar @nodes;
The URI module constructor would clean up the URI for you - for example correctly escape the characters invalid for URI construction (see URI::Escape).
The URI module as several benefits:
The benefit that you get with the little bit of code that you show is minimal, but as you continue to work on the problem, perhaps spidering the site, URI becomes more handy as you select what to do next.
I'm surprised nobody has mentioned it yet, but$url = URI->new( $url );
doesn't clean up your $url
and hand it back to you, it creates a new object of class URI
(or, rather, of one if its subclasses) which can then be passed to other code which requires a URI
object. That's not particularly important in this case, since XML::LibXML
appears to be happy to accept locations as either strings or objects, but some other modules require you to give them a URI
object and will reject URLs presented as plain strings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With