Python splitting text file keeping newlines

Question

I am trying to split up a text file into words, with being counted as a word.

My input is this text file:

War and Peace

by Leo Tolstoy/Tolstoi

And I want a list output like this:

['War','and','Peace','
','
','by','Leo','Tolstoy/Tolstoi']

Using .split() I get this:

['War', 'and', 'Peace

by', 'Leo', 'Tolstoy/Tolstoi']

So I started writing a program to put the as a separate entry after the word, code following:

for oldword in text:
counter = 0
newword = oldword
while "
" in newword:
    newword = newword.replace("
","",1)
    counter += 1

text[text.index(oldword)] = newword

while counter > 0:
    text.insert(text.index(newword)+1, "
")
    counter -= 1

However, the program seems to hang on the line counter -= 1, and I can't for the life of me figure out why.

NOTE: I realise that were this to work, the result would be ['Peaceby'," "," "]; that is a different problem to be solved later.

Mazdak · Accepted Answer

You don't need such complicated way, You can simply use regex and re.findall() to find all the words and new lines:

>>> s="""War and Peace
... 
... by Leo Tolstoy/Tolstoi"""
>>> 
>>> re.findall(r'\S+|
',s)
['War', 'and', 'Peace', '
', '
', 'by', 'Leo', 'Tolstoy/Tolstoi']

'\S+| ' will match all the combinations of none whitespace character with length 1 or more (\S+) or new line ().

If you want to get the text from a file you can do the following:

with open('file_name') as f:
     re.findall(r'\S+|
',f.read())

Read more about regular expressions http://www.regular-expressions.info/

Python splitting text file keeping newlines

Tags:

python

split

newline

counter

Christopher Riches

1 Answers

Mazdak

Recent Activity

Donate For Us

Python splitting text file keeping newlines

Tags:

python

split

newline

counter

Christopher Riches

1 Answers

Mazdak

Related questions

Recent Activity

Donate For Us