<div class="box notranslate" id="venueHours">
<h5 class="translate">Hours</h5>
<div class="status closed">Currently closed</div>
<div class="hours">
<div class="timespan">
<div class="openTime">
<div class="days">Mon,Tue,Wed,Thu,Sat</div>
<span class="hours"> 10:00 AM–6:00 PM</span>
</div>
</div>
<div class="timespan">
<div class="openTime">
<div class="days">Fri</div>
<span class="hours"> 10:00 AM–9:00 PM</span></div>
</div>
<div class="timespan">
<div class="openTime">
<div class="days">Sun</div>
<span class="hours"> 10:00 AM–5:00 PM</span>
</div>
</div>
</div>
</div>
I'm trying to capture the contents in all the <div class="days"> and <span class="hours">. I think I'm able to use regular expression in this task. But I also want to learn any funny or professional ways to capture the specific div blocks like this. Thanks.
In addition to the HTML parsing libraries mentioned elsewhere, other modules have DOM capability too. See for example Web::Query and Mojolicious' Mojo::DOM.
Here is an example using Mojo::DOM and CSS3 selectors:
#!/usr/bin/env perl
use strict;
use warnings;
use 5.10.0;
use Mojo::DOM;
my $dom = Mojo::DOM->new(<<'HTML');
<div class="box notranslate" id="venueHours">
<h5 class="translate">Hours</h5>
<div class="status closed">Currently closed</div>
<div class="hours">
<div class="timespan">
<div class="openTime">
<div class="days">Mon,Tue,Wed,Thu,Sat</div>
<span class="hours"> 10:00 AM–6:00 PM</span>
</div>
</div>
<div class="timespan">
<div class="openTime">
<div class="days">Fri</div>
<span class="hours"> 10:00 AM–9:00 PM</span></div>
</div>
<div class="timespan">
<div class="openTime">
<div class="days">Sun</div>
<span class="hours"> 10:00 AM–5:00 PM</span>
</div>
</div>
</div>
</div>
HTML
say "div days:";
say $_->text for $dom->find('div.days')->each;
say "\nspan hours:";
say $_->text for $dom->find('span.hours')->each;
Or equivalently:
say "div days:";
say for $dom->find('div.days')->map(sub{$_->text})->each;
say "\nspan hours:";
say for $dom->find('span.hours')->map(sub{$_->text})->each;
Output:
div days:
Mon,Tue,Wed,Thu,Sat
Fri
Sun
span hours:
10:00 AM–6:00 PM
10:00 AM–9:00 PM
10:00 AM–5:00 PM
Or to get the times corresponding to the days, you can use the children of the openTimes div:
say "Open Times:";
say for $dom->find('div.openTime')
->map(sub{$_->children->each})
->map(sub{$_->text})
->each;
Output:
Open Times:
Mon,Tue,Wed,Thu,Sat
10:00 AM–6:00 PM
Fri
10:00 AM–9:00 PM
Sun
10:00 AM–5:00 PM
Edit: Daxim has posted the analogous Web::Query code as a comment, so I will repost it here for better formatting. I haven't tried it, but I trust his code generally. Assuming the HTML is in a variable $html:
use Web::Query qw();
my $w = Web::Query->new_from_html($html);
say "div days:";
say for $w->find('div.days')->text;
say "\nspan hours:";
say for $w->find('span.hours')->text;
say "Open Times:";
$w->find('div.openTime')->each(sub { say for $_->find('*')->text });
Use modules specific to this task: HTML::Parser, HTML::Tree and the like.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With