Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Parser for programming beginner (german umlaute needed)

I was hoping someone could give me some feedback on the following package:

Pyparsing

I want to write a python program that takes, as input, a .txt file, and as output some kind of structured data in .csv or even excel format. A friend who quit the project tried something with ANTLR + Java, but german Umlaute "ä,ö,ü" made trouble. Now I (as a programming beginner) would like to write a program that works. I know some Matlab, but that's it. I started a coursera module on python programming (Python for everyone) to learn the basics.

I now wanted to ask whether the mentioned "package" pyparsing can handle german umlaute or whether I will run into trouble here.

In other words: If you were to recommend a python parsing strategy to a noob, what would it be?

like image 887
safex Avatar asked Mar 07 '26 05:03

safex


1 Answers

On page 46 of your documentation is the section 7.15.printables: All the printable non-whitespace character. These are listed:

>>> len(pp.printables)
94
>>> print pp.printables
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-
./:;<=>?@[\]^_`{|}~

As you can see there are no german umlauts in there. This is because the project uses standard ASCII encoding instead of utf-8 which would support every character you could possibly want. This is most likely due to them using Python 2 instead of Python 3.

EDIT: I've just found the following on their website:

NOTE - Pyparsing 2.x supports Python versions 2.6, 2.7, and 3.x. If you are using Python 2.5 or older, you must specifcally install version 1.5.7. See more info on the News page

Theoratically you should be able to use utf-8 when you install the module for Python 3. Unfortunately the updated Documentation does not mention printables so I can't be sure.

like image 156
Gnarflord Avatar answered Mar 09 '26 19:03

Gnarflord