Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HTML downloading and text extraction

What would be a good tool, or set of tools, to download a list of URLs and extract only the text content? Spidering is not required, but control over the download file names, and threading would be a bonus.

The platform is linux.

like image 529
Cammel Avatar asked Mar 01 '23 02:03

Cammel


1 Answers

wget | html2ascii

Note: html2ascii can also be called html2a or html2text (and I wasn't able to find a proper man page on the net for it).

See also: lynx.

like image 82
dsm Avatar answered Mar 05 '23 14:03

dsm