is there a way to find all links within a specific div by using Mechanize?
I tried to use find_all_links but couldn't find a way to get through this. for example,
<div class="sometag">
<ul class"tags">
<li><a href="/a.html">A</a></li>
<li><a href="/b.html">B</a></li>
</ul>
</div>
A useful tool for grabbing useful info out of HTML files is HTML::Grabber. It uses a jQuery style of syntax to reference elements in the HTML, so you might do something like this:
use HTML::Grabber;
# Your mechanize stuff here ...
my $dom = HTML::Grabber->new( html => $mech->content );
my @links;
$dom->find('div.sometag a')->each(sub {
push @links, $_->attr('href');
});
Web::Scraper is useful for scraping.
use strict;
use warnings;
use WWW::Mechanize;
use Web::Scraper;
my $mech = WWW::Mechanize->new;
$mech->env_proxy;
# If you want to login, do it with mechanize.
my $staff = scrape { process 'div.sometag li.tags a', 'links[]' => '@href' };
# pass mechanize to scraper as useragent.
$staff->user_agent($mech);
my $res = $staff->scrape( URI->new("http://example.com/") );
for my $link (@{$res->{links}}) {
warn $link;
}
Sorry, I didn't test this code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With