<p>I'm trying to parse the following HTML structure with in perl. I need to select all of the dd elements that contain the class message and also an id. All I would like the script to do is loop through all of the dd elements and print out the id of the dd element but it needs to ignore the first dd element as that is static and will not change.</p> <p>It can be with any perl module as long as it can be installed from cpan to make it easy for me. I don't have much experience with perl and parsing html so any pointers would be very helpful.</p> <p>Thanks :)</p> <p>HTML Structure:</p> <pre class="prettyprint"><code><pre><code> <html> <head> </head> <body> .....other elements <div id="messages"> <div class="header"></div> <dl> <dd class="message unread mc-friend mc-message">This is just a random message, do not parse</dd> <dd id="msg2" class="message unread mc-message"> Hello </div> <dd id="msg3" class="message unread mc-message"> Hello </dd> </dl> </div> </body> </html> </pre></code> </code></pre>

<p>Something like this, quick and easy:</p> <pre class="prettyprint"><code>#! /usr/bin/perl use strict; use warnings; use Mojo::DOM; my $html = "Your HTML goes here"; my $dom = Mojo::DOM->new; $dom->parse($html); my $skip; for my $dd ($dom->find('dd[class*="message"]')->each) { print $dd->attrs->{id}, "\n" if $skip++; } </code></pre>

HTML parsing in perl

Tags:

html

html-parsing

perl

I'm trying to parse the following HTML structure with in perl. I need to select all of the dd elements that contain the class message and also an id. All I would like the script to do is loop through all of the dd elements and print out the id of the dd element but it needs to ignore the first dd element as that is static and will not change.

It can be with any perl module as long as it can be installed from cpan to make it easy for me. I don't have much experience with perl and parsing html so any pointers would be very helpful.

Thanks :)

HTML Structure:

<pre><code>
<html>
<head>
</head>
<body>
 .....other elements
    <div id="messages">
        <div class="header"></div>
        <dl>
            <dd class="message unread mc-friend mc-message">This is just a random message, do not parse</dd>
            <dd id="msg2" class="message unread mc-message">
                Hello
            </div>
            <dd id="msg3" class="message unread mc-message">
                Hello
            </dd>
        </dl>
    </div>
</body>
</html>
</pre></code>

751

asked Jan 04 '11 20:01

Jack

1 Answers

Something like this, quick and easy:

#! /usr/bin/perl
use strict;
use warnings;

use Mojo::DOM;

my $html = "Your HTML goes here";

my $dom = Mojo::DOM->new;
$dom->parse($html);
my $skip;
for my $dd ($dom->find('dd[class*="message"]')->each) {
    print $dd->attrs->{id}, "\n" if $skip++;
}

182

answered Oct 02 '22 12:10

Grrrr

Related questions
                            
                                text on left and right side of element
                            
                                Listening to events of a contenteditable HTML element
                            
                                Running c++ in browser
                            
                                Responsive Web Design Tips, Best Practices and Dynamic Image Scaling Techniques
                            
                                how to get iframe to fill entire page 100% and top:4px?
                            
                                how to get multiple selected values and items from listbox using javascript
                            
                                position:relative hides border in Internet Explorer
                            
                                How to open an app if installed through a webpage in Safari?
                            
                                Highcharts legend font sizes
                            
                                SVG filter only working when added in style attribute (Firefox)
                            
                                Make scroll bar take no space / prevent layout shift
                            
                                Does > :first-child work whether the type is known or unknown?
                            
                                AngularJS pass string as function to use at ng-click
                            
                                Selecting just the middle section of an image with CSS
                            
                                How do you use GitHub's primer and octicons?
                            
                                How do I type html in a markdown file without it rendering?
                            
                                React. Creating a function that returns html
                            
                                Regular expression for parsing links from a webpage?
                            
                                middle click (new tabs) and javascript links
                            
                                onscroll for div

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With