Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

bash command to convert html page to a text file

Tags:

bash

I am a beginner to linux. Would you please help me how to convert an html page to a text file. the text file will remove any images and links from the webpage. I want to use only bash commands and not html to text converting tools. As an example, i want to convert the first page google search results for "computers".

Thank you

like image 383
The Coder Avatar asked Sep 14 '12 10:09

The Coder


People also ask

How do I convert a Web page to a text file?

Click the “Save as” or “Save Page As” option and select “Text Files” from the Save as Type drop-down menu. Type a name for the text file and click “Save.” The text from the Web page will be extracted and saved as a text file that can be viewed in text editors and document programs such as Microsoft Word.

How do you create a text file in bash?

To create a new file, run the "cat" command and then use the redirection operator ">" followed by the name of the file. Now you will be prompted to insert data into this newly created file. Type a line and then press "Ctrl+D" to save the file.


1 Answers

Easiest way is to use something like this which the dump (in short is the text version of viewable HTML).

Remote file:

lynx --dump www.google.com > file.txt links -dump www.google.com 

Local file:

lynx --dump ./1.html > file.txt links -dump ./1.htm 

With charset conversion to utf8 (see):

lynx -dump -display_charset UTF-8 ./1.htm links -dump -codepage UTF-8 ./1.htm 
like image 110
V H Avatar answered Sep 19 '22 14:09

V H