Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you spell check a website?

I know that spellcheckers are not perfect, but they become more useful as the amount of text you have increases in size. How can I spell check a site which has thousands of pages?

Edit: Because of complicated server-side processing, the only way I can get the pages is over HTTP. Also it cannot be outsourced to a third party.

Edit: I have a list of all of the URLs on the site that I need to check.

like image 360
Liam Avatar asked Feb 25 '09 11:02

Liam


People also ask

Is it possible to spell check a website?

Typosaurus is the ultimate website spell checker for digging up those embarrassing spelling mistakes you may have missed for millions of years. This tool allows you to check the spelling of a web page. It currently only supports English and French.

Can you do spell check on Google sites?

Thankfully, Google allows you to use its spell-check feature everywhere in the Chrome web browser.


Video Answer


2 Answers

Lynx seems to be good at getting just the text I need (body content and alt text) and ignoring what I don't need (embedded Javascript and CSS).

lynx -dump http://www.example.com 

It also lists all URLs (converted to their absolute form) in the page, which can be filtered out using grep:

lynx -dump http://www.example.com | grep -v "http" 

The URLs could also be local (file://) if I have used wget to mirror the site.

I will write a script that will process a set of URLs using this method, and output each page to a seperate text file. I can then use an existing spellchecking solution to check the files (or a single large file combining all of the small ones).

This will ignore text in title and meta elements. These can be spellchecked seperately.

like image 140
Liam Avatar answered Nov 13 '22 19:11

Liam


Just a view days before i discovered Spello web site spell checker. It uses my NHunspell (Open office Spell Checker for .NET) libaray. You can give it a try.

like image 20
Thomas Maierhofer Avatar answered Nov 13 '22 18:11

Thomas Maierhofer