i need to to following in platform independent way:
1) read the file with using codecs.open() (for utf-8)
2) split lines according to two new lines.
3) split entities according to new line
example input:
1) FIRST UTF-8 ENTITY ŞŞŞŞ\n
2) SECOND ELEMENT OF FIRST ENTITY\n
\n\n
1) SECOND ENTITIY\n
2) SECOND ELEMENT OF SECOND ENTITIY\n
after reading file, string.split('\n\n') works in mac osx, but it does not seem platform independent way of handling this (file might be prepared on another os).
i know that string.splitlines() works platform independent but how to split two new lines between entities in platform independent way?
edit: file might be prepared on any platform, thus might have any kind of line endings.
Python has a built-in tool to deal with this: os.linesep
. So you can use :
string.split(2*os.linesep)
Open the text file using the Universal mode.
codecs.open(filename, 'U')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With