I need to get total number of WORDS on a web page. I know about the System.Net.WebClient
class. But it's DownloadString()
method return the whole HTML markup where as what I need is only the TEXT so that I can figure out the number of words.
Any ideas/suggestions welcome.
How does the word count tool work? Simply go to Website Word Count page, enter your website URL and press enter. Once the scan starts, you can already see the tool in action. On the right side, you can see each separate URL with its total number of words.
To open the Word Count dialog box, select the word count in the status bar or press Ctrl + Shift + G on your keyboard. The Word Count dialog box shows the number of pages, words, characters with and without spaces, paragraphs, and lines in your document.
Quick steps to check word count using the menu bar Open the relevant file in Google Docs in your browser. At the top of the page, click Tools and then Word count. A window will pop up displaying the current count of pages, words, characters, and characters excluding spaces.
Use the HTML Agility Pack to download and parse the HTML document.
You can then query the document object and extract the inner text of all nodes.
Take a look at HTML Agility Pack. It allows you to apply XPath expressions to an HTML document.
You want to find all text nodes and then count the words. //text()
is the XPath to get all text nodes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With