Removing white space from txt with python

Question

I have a .txt file (scraped as pre-formatted text from a website) where the data looks like this:

B, NICKOLAS                       CT144531X       D1026    JUDGE ANNIE WHITE JOHNSON  
ANDREWS VS BALL                   JA-15-0050      D0015    JUDGE EDWARD A ROBERTS

I'd like to remove all extra spaces (they're actually different number of spaces, not tabs) in between the columns. I'd also then like to replace it with some delimiter (tab or pipe since there's commas within the data), like so:

ANDREWS VS BALL|JA-15-0050|D0015|JUDGE EDWARD A ROBERTS

Looked around and found that the best options are using regex or shlex to split. Two similar scenarios:

Python Regular expression must strip whitespace except between quotes,
Remove white spaces from dict : Python.

timgeb · Accepted Answer

You can apply the regex '\s{2,}' (two or more whitespace characters) to each line and substitute the matches with a single '|' character.

>>> import re
>>> line = 'ANDREWS VS BALL                   JA-15-0050      D0015    JUDGE EDWARD A ROBERTS        '
>>> re.sub('\s{2,}', '|', line.strip())
'ANDREWS VS BALL|JA-15-0050|D0015|JUDGE EDWARD A ROBERTS'

Stripping any leading and trailing whitespace from the line before applying re.sub ensures that you won't get '|' characters at the start and end of the line.

Your actual code should look similar to this:

import re
with open(filename) as f:
    for line in f:
        subbed = re.sub('\s{2,}', '|', line.strip())
        # do something here

Removing white space from txt with python

Tags:

python

regex

whitespace

python-2.7

shlex

aysha

1 Answers

timgeb

Recent Activity

Donate For Us

Removing white space from txt with python

Tags:

python

regex

whitespace

python-2.7

shlex

aysha

1 Answers

timgeb

Related questions

Recent Activity

Donate For Us