Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I render HTML as text using Perl as Lynx does? [duplicate]

Tags:

html

perl

render

Possible Duplicate:
Which CPAN module would you recommend for turning HTML into plain text?

Question:

  • Is there a module to render HTML, specifically to gather the text, while adhering to font-style tags, such as <tt>, <b>, <i>, etc and break-line <br>, similar to Lynx.

For example:

# cat test.html

<body>  
<div id="foo" class="blah">  
<tt>test<br>
<b>test</b><br>
whatever<br>
test</tt>
</div>
</body>

# lynx.exe --dump test.html

test
test
whatever
test

Note: the second line should be bold.

like image 698
Aaron Avatar asked Dec 01 '22 06:12

Aaron


2 Answers

Lynx is a big program and its html rendering will be non trivial.

How about this:

my $lynx = '/path/to/lynx';
my $html = [ html here ];
my $txt = `$lynx --dump --width 9999 -stdin <<EOF\n$html\nEOF\n`;
like image 75
singingfish Avatar answered Dec 04 '22 08:12

singingfish


Go to search.cpan.org and search for HTML text which will give you lots of options to suit your particular needs. HTML::FormatText is a good baseline, and then branch out into specific variations of it, for example HTML::FormatText::WithLinks if you want to preserve links as footnotes.

like image 38
Schwern Avatar answered Dec 04 '22 08:12

Schwern