I am a beginner to linux. Would you please help me how to convert an html page to a text file. the text file will remove any images and links from the webpage. I want to use only bash commands and not html to text converting tools. As an example, i want to convert the first page google search results for "computers".
Thank you
Click the “Save as” or “Save Page As” option and select “Text Files” from the Save as Type drop-down menu. Type a name for the text file and click “Save.” The text from the Web page will be extracted and saved as a text file that can be viewed in text editors and document programs such as Microsoft Word.
To create a new file, run the "cat" command and then use the redirection operator ">" followed by the name of the file. Now you will be prompted to insert data into this newly created file. Type a line and then press "Ctrl+D" to save the file.
Easiest way is to use something like this which the dump (in short is the text version of viewable HTML).
Remote file:
lynx --dump www.google.com > file.txt links -dump www.google.com
Local file:
lynx --dump ./1.html > file.txt links -dump ./1.htm
With charset conversion to utf8 (see):
lynx -dump -display_charset UTF-8 ./1.htm links -dump -codepage UTF-8 ./1.htm
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With