I am using HTML::TreeBuilder to parse some HTML.
Can you specify multiple classes in the 'look_down' routine?
For in stance when searching through HTML using-
for ( $tree->look_down( 'class' => 'postbody'))
I also was to search for an additional class 'postprofile'
in the same loop.
Is there a way of doing this without having to use a new -for ( $tree->look_down( 'class' => 'postprofile' ))
As this brings back 2 sets of results whereas I only want one merged set.
I tried using - for ( $tree->look_down( 'class' => 'postbody||postprofile'))
However this did not work,
Thank you in advance.
Try using a pattern instead of a string, i.e.,
$tree->look_down( 'class' => qr/^(?:postbody|postprofile)$/)
Jambo, I am not trying to be rude, but please read the manual. I added links to your question.
I am going to assume that you did not read the docs because you were unable to find them. Let's address that issue:
How to Find the Docs You Need
Online:
search.cpan.org is a main website used to search for CPAN modules and their documentation. Many things can be found there.
perldoc.perl.org has the complete shipping documentation online for several recent versions of Perl.
Command Line:
perldoc
shows a table of contents listing different sections of documentation you can peruse.
perldoc -f function
is a quick way to search perlfunc and see the information on only one function. This is a super handy quick reference.
perldoc Module::Name::Here
will show you a module's documentation.
perldoc perlpod
is a sample of reading a section of the docs, in this case the article on POD formatting.
Which thing do I read?
All this is great, but how do you know where to look? I mean, I've got this thing called "look_down" that I am using. Where are the docs?
In this case, you can see that "look_down" is always called like this $somevar->look_down(blarg)
. Find where $somevar
comes from. What kind of object is it? Worst case, you found that it is the result of some other call, now you have to find the docs for THAT call and see what is returned. But the steps are the same. Recursively push on through. Eventually you'll get to my $tree = HTML::TreeBuilder->new_from_content()
or something like that. Now you can read the new_from_content
docs in HTML::TreeBuilder. Hey, we get a HTML::Tree object that is a subclass of HTML::Element! So we check both classes. Whoah, look_down is in HTML::Element.
This is a little trickier if you have routines that are imported from other modules. Hopefully the author of your code was considerate enough to explicitly list where his routines come from:
use Some::Module qw( useful_sub confusing_sub );
This means that useful_sub
and confusing_sub
come from Some::Module
;
If you are unlucky your author wrote only use Some::Module;
which means you get all the default exports. Which means you need to read the docs to find out what was imported.
For maintainability's sake, you can reduce this nightmare by always specifying exactly what routines you import from a function. If you want to import NOTHING, you can specify that as: use Some::Module ();
When looking for plain sub-names, it helps to remember that they may be actual functions. So don't forget to search perldoc.
In closing, I hope you find this useful. R-ing TFM is an amazingly powerful technique, and learning how to find relevant docs is the hidden skill that unlocks the power. Perl has a ton of docs to wade through, and it can be intimidating when you don't know where to look.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With