How to do spell check on html and xml?

Question

I have to do spell check for large number of big html and xml documents (more than 30.000). I also need custom dictionary and sophisticated algorithms of checking. I try to use BASH + linux utility (sed, grep, ...) with hunspell. Hunspell has option -H that force it to check document as HTML (for XML the option is also suitable). But there is one problem: it output offsets and not number of line also it can check line by line because in this case it looks inside of tags (he can't find closed tag). So what is the right way to do the task?

devsnd · Accepted Answer

I just had a similar problem. You should be able to get a good output by using those undocumented switches, e.g. -u or -U. But be careful, as those features seem to be experimental right now, and I only found out about their existance by looking at the sources of hunspell.

So essentially:

Click to copy

hunspell -H -u my-file.html

should do it.

Alternatively, there are also the switches -u1, -u2 and -u3 you can play around with.

So essentially:

hunspell -H -u my-file.html

should do it.

Alternatively, there are also the switches -u1, -u2 and -u3 you can play around with.

How to do spell check on html and xml?

Tags:

bash

xml

spell-checking

hunspell

MaXal

1 Answers

devsnd

Recent Activity

Donate For Us

How to do spell check on html and xml?

Tags:

bash

xml

spell-checking

hunspell

MaXal

1 Answers

devsnd

Related questions

Recent Activity

Donate For Us