Splitting large text file by a delimiter in Python

Question

I imaging this is going to be a simple task but I can't find what I am looking for exactly in previous StackOverflow questions to here goes...

I have large text files in a proprietry format that look comething like this:

:Entry
- Name
John Doe

- Date
20/12/1979
:Entry

-Name
Jane Doe
- Date
21/12/1979

And so forth.

The text files range in size from 10kb to 100mb. I need to split this file by the :Entry delimiter. How could I process each file based on :Entry blocks?

unutbu · Accepted Answer

You could use itertools.groupby to group lines that occur after :Entry into lists:

import itertools as it
filename='test.dat'

with open(filename,'r') as f:
    for key,group in it.groupby(f,lambda line: line.startswith(':Entry')):
        if not key:
            group = list(group)
            print(group)

yields

['- Name
', 'John Doe
', '
', '- Date
', '20/12/1979
']
['
', '-Name
', 'Jane Doe
', '- Date
', '21/12/1979
']

Or, to process the groups, you don't really need to convert group to a list:

with open(filename,'r') as f:
    for key,group in it.groupby(f,lambda line: line.startswith(':Entry')):
        if not key:
            for line in group:
                ...

infrared · Answer

If every entry block starts with a colon, you can just split by that:

with  open('entries.txt') as fp:
    contents = fp.read()
    for entry in contents.split(':'):
        # do something with entry

Splitting large text file by a delimiter in Python

Tags:

python

text-parsing

Kevin

2 Answers

unutbu

infrared

Recent Activity

Donate For Us

Splitting large text file by a delimiter in Python

Tags:

python

text-parsing

Kevin

2 Answers

unutbu

infrared

Related questions

Recent Activity

Donate For Us