I've been searching for a command line tool that would turn html code into just the text that would appear on the site... so it would be equivalent to in a web browser selecting everything and then pasting it into a text editor...
Anyone know of something in Ubuntu that would do this? I'm trying to write a script to parse some webpages, but would prefer not to have to deal with the HTML and would prefer to just parse the text that appears on the website.
Thanks,
Dan
lynx -dump http://example.com/
if you already have the html file:
lynx -dump file.html > file.txt
otherwise use @Ignacio's
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With