Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which CPAN module would you recommend for turning HTML into plain text?

Tags:

perl

cpan

lynx

Which CPAN module would you recommend for turning HTML into formatted plain text?

One strict requirement is that the module must handle Unicode characters.

like image 365
knorv Avatar asked Dec 20 '09 01:12

knorv


People also ask

What is CPAN module in Perl?

The Comprehensive Perl Archive Network (CPAN) is a repository of over 250,000 software modules and accompanying documentation for 39,000 distributions, written in the Perl programming language by over 12,000 contributors.

Where does CPAN install to?

CPAN doesn't actually install files. It runs the install script embedded in each distribution, which then performs the actual install. For distributions using ExtUtils::MakeMaker, the defaults are documented here: https://metacpan.org/pod/ExtUtils::MakeMaker#make-install (and the default value of INSTALLDIRS is site ).


2 Answers

I like HTML::FormatText and HTML::FormatText::WithLinks

like image 188
mpeters Avatar answered Nov 09 '22 05:11

mpeters


See the example script htext that comes with HTML::Parser.

like image 4
Sinan Ünür Avatar answered Nov 09 '22 06:11

Sinan Ünür