I know that spellcheckers are not perfect, but they become more useful as the amount of text you have increases in size. How can I spell check a site which has thousands of pages?
Edit: Because of complicated server-side processing, the only way I can get the pages is over HTTP. Also it cannot be outsourced to a third party.
Edit: I have a list of all of the URLs on the site that I need to check.
Typosaurus is the ultimate website spell checker for digging up those embarrassing spelling mistakes you may have missed for millions of years. This tool allows you to check the spelling of a web page. It currently only supports English and French.
Thankfully, Google allows you to use its spell-check feature everywhere in the Chrome web browser.
Lynx seems to be good at getting just the text I need (body content and alt text) and ignoring what I don't need (embedded Javascript and CSS).
lynx -dump http://www.example.com
It also lists all URLs (converted to their absolute form) in the page, which can be filtered out using grep:
lynx -dump http://www.example.com | grep -v "http"
The URLs could also be local (file://
) if I have used wget to mirror the site.
I will write a script that will process a set of URLs using this method, and output each page to a seperate text file. I can then use an existing spellchecking solution to check the files (or a single large file combining all of the small ones).
This will ignore text in title and meta elements. These can be spellchecked seperately.
Just a view days before i discovered Spello web site spell checker. It uses my NHunspell (Open office Spell Checker for .NET) libaray. You can give it a try.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With