Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python and text manipulation

Tags:

python

text

I want to learn a text manipulation language and I have zeroed in on Python. Apart from text manipulation Python is also used for numerical applications, machine learning, AI, etc.

My question is how do I approach the learning of Python language so that I am quickly able to write sophisticated text manipulation utilities. Apart from regular expressions in the context of "text manipulation" what language features are more important than others what modules are useful and so on.

like image 717
ardsrk Avatar asked Mar 24 '09 05:03

ardsrk


2 Answers

Beyond regular expressions here are some important features:

  • Generators, see Generator Tricks for Systems Programmers by David Beazley for a lot of great examples to pipeline unlimited amounts of text through generators.

For tools, I recommend looking at the following:

  • Whoosh, a pure Python search engine that will give you some nice real life examples of parsing text using pyparsing and text processing in Python in general.

  • Ned Batcheldor's nice reviews of various Python parsing tools.

  • mxTextTools

  • Docutils source code for more advanced text processing in Python, including a sophisticated state machine.

Edit: A good links specific to text processing in Python:

  • Text Processing in Python by David Mertz. I think the book is still available, although it's probably a bit dated now.
like image 73
Van Gale Avatar answered Jan 02 '23 07:01

Van Gale


There's a book Text Processing in Python. I didn't read it myself yet but I've read other articles of this author and generally they're a good staff.

like image 33
Eugene Morozov Avatar answered Jan 02 '23 08:01

Eugene Morozov