Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get number of words on a web page? [closed]

Tags:

c#

asp.net

I need to get total number of WORDS on a web page. I know about the System.Net.WebClient class. But it's DownloadString() method return the whole HTML markup where as what I need is only the TEXT so that I can figure out the number of words.

Any ideas/suggestions welcome.

like image 216
Manish Avatar asked May 23 '11 10:05

Manish


People also ask

How do you get a word count on a website?

How does the word count tool work? Simply go to Website Word Count page, enter your website URL and press enter. Once the scan starts, you can already see the tool in action. On the right side, you can see each separate URL with its total number of words.

How do you check word count on keyboard?

To open the Word Count dialog box, select the word count in the status bar or press Ctrl + Shift + G on your keyboard. The Word Count dialog box shows the number of pages, words, characters with and without spaces, paragraphs, and lines in your document.

Can Google Docs count words?

Quick steps to check word count using the menu bar Open the relevant file in Google Docs in your browser. At the top of the page, click Tools and then Word count. A window will pop up displaying the current count of pages, words, characters, and characters excluding spaces.


2 Answers

Use the HTML Agility Pack to download and parse the HTML document.

You can then query the document object and extract the inner text of all nodes.

like image 65
Oded Avatar answered Sep 28 '22 17:09

Oded


Take a look at HTML Agility Pack. It allows you to apply XPath expressions to an HTML document.

You want to find all text nodes and then count the words. //text() is the XPath to get all text nodes.

like image 35
Richard Schneider Avatar answered Sep 28 '22 17:09

Richard Schneider