I want to learn a text manipulation language and I have zeroed in on Python. Apart from text manipulation Python is also used for numerical applications, machine learning, AI, etc.
My question is how do I approach the learning of Python language so that I am quickly able to write sophisticated text manipulation utilities. Apart from regular expressions in the context of "text manipulation" what language features are more important than others what modules are useful and so on.
Beyond regular expressions here are some important features:
For tools, I recommend looking at the following:
Whoosh, a pure Python search engine that will give you some nice real life examples of parsing text using pyparsing and text processing in Python in general.
Ned Batcheldor's nice reviews of various Python parsing tools.
mxTextTools
Docutils source code for more advanced text processing in Python, including a sophisticated state machine.
Edit: A good links specific to text processing in Python:
There's a book Text Processing in Python. I didn't read it myself yet but I've read other articles of this author and generally they're a good staff.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With