How can I parse word documents ".doc", ".docx" to get all the text using golang?
You can get some inspiration from those projects:
https://github.com/nguyenthenguyen/docx
https://github.com/opencontrol/doc-template
Basically, DOCX is a Zip file with XMLs in it.
All the texts are inside document.xml
What both project do is remove all XML tags, leaving only text intact. You should see if that approach suits you too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With