Can you provide examples of parsing HTML?

People also ask

How do you parse HTML?

If you just want to parse HTML and your HTML is intended for the body of your document, you could do the following : (1) var div=document. createElement("DIV"); (2) div. innerHTML = markup; (3) result = div. childNodes; --- This gives you a collection of childnodes and should work not just in IE8 but even in IE6-7.

What is parsing an HTML?

Parsing means analyzing and converting a program into an internal format that a runtime environment can actually run, for example the JavaScript engine inside browsers. The browser parses HTML into a DOM tree. HTML parsing involves tokenization and tree construction.

Why do we use HTML parser?

The HTML parser is a structured markup processing tool. It defines a class called HTMLParser, which is used to parse HTML files. It comes in handy for web crawling.

Language: JavaScript
Library: jQuery

$.each($('a[href]'), function(){
    console.debug(this.href);
});

(using firebug console.debug for output...)

And loading any html page:

$.get('http://stackoverflow.com/', function(page){
     $(page).find('a[href]').each(function(){
        console.debug(this.href);
    });
});

Used another each function for this one, I think it's cleaner when chaining methods.

Language: C#
Library: HtmlAgilityPack

class Program
{
    static void Main(string[] args)
    {
        var web = new HtmlWeb();
        var doc = web.Load("http://www.stackoverflow.com");

        var nodes = doc.DocumentNode.SelectNodes("//a[@href]");

        foreach (var node in nodes)
        {
            Console.WriteLine(node.InnerHtml);
        }
    }
}

language: Python
library: BeautifulSoup

from BeautifulSoup import BeautifulSoup

html = "<html><body>"
for link in ("foo", "bar", "baz"):
    html += '<a href="http://%s.com">%s</a>' % (link, link)
html += "</body></html>"

soup = BeautifulSoup(html)
links = soup.findAll('a', href=True) # find <a> with a defined href attribute
print links

output:

[<a href="http://foo.com">foo</a>,
 <a href="http://bar.com">bar</a>,
 <a href="http://baz.com">baz</a>]

also possible:

for link in links:
    print link['href']

output:

http://foo.com
http://bar.com
http://baz.com

Language: Perl
Library: pQuery

use strict;
use warnings;
use pQuery;

my $html = join '',
    "<html><body>",
    (map { qq(<a href="http://$_.com">$_</a>) } qw/foo bar baz/),
    "</body></html>";

pQuery( $html )->find( 'a' )->each(
    sub {  
        my $at = $_->getAttribute( 'href' ); 
        print "$at\n" if defined $at;
    }
);

language: shell
library: lynx (well, it's not library, but in shell, every program is kind-of library)

lynx -dump -listonly http://news.google.com/

language: Ruby
library: Hpricot

#!/usr/bin/ruby

require 'hpricot'

html = '<html><body>'
['foo', 'bar', 'baz'].each {|link| html += "<a href=\"http://#{link}.com\">#{link}</a>" }
html += '</body></html>'

doc = Hpricot(html)
doc.search('//a').each {|elm| puts elm.attributes['href'] }

Related questions
                            
                                Google Web Fonts and PDF generation from HTML with wkhtmltopdf
                            
                                What is the proper HTML entity for the "x" in a dimension?
                            
                                What’s the point of using the HTML5 <time> tag?
                            
                                How do I escape a string inside JavaScript code inside an onClick handler?
                            
                                HTML 5 Geo Location Prompt in Chrome
                            
                                Open an .html file with default browser using Bash on Mac
                            
                                using text-align center in colgroup
                            
                                Detect if browser is running on an Android or iOS device
                            
                                How to push a footer to the bottom of page when content is short or missing?
                            
                                Blur effect on a div element
                            
                                HTML/CSS - Input [Text] How to disable the browser from offering suggestings in the Dropdown?
                            
                                Navigating HTML tags in Vim
                            
                                css: avoid image hover first time blinking
                            
                                How to make div appear in front of another?
                            
                                3 column layout HTML/CSS
                            
                                Creating a select box with a search option
                            
                                How to wrap long lines without spaces in HTML?
                            
                                How to delete HTML tags, not the contents in Vim
                            
                                Css transition from display none to display block, navigation with subnav [duplicate]
                            
                                Angular 2 Date Input not binding to date value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can you provide examples of parsing HTML?

Tags:

language-agnostic

html

html-parsing

People also ask

Recent Activity

Donate For Us