Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split lines with multiple words in Python

I have a (very ugly) txt output from an SQL query which is performed by external system that I can't change. Here is the output example:

FruitName      Owner             OwnerPhone
=============  ================= ============
Red Apple      Sr Lorem Ipsum    123123
Yellow Banana  Ms Dolor sir Amet 456456

As you can see, the FruitName column and the Owner column may consists of few words and there's no fixed pattern in how many words could be in these columns. If I use line.split() to make array on each line Python, it will remove all the whitespace and make the array become like this:

['Red', 'Apple', 'Sr', 'Lorem', 'Ipsum', '123123']
['Yellow', 'Banana', 'Ms', 'Dolor', 'sir', 'Amet', '456456']

The question is, how can I split it properly into output like this:

['Red Apple', 'Sr Lorem Ipsum', '123123']
['Yellow Banana', 'Ms Dolor sir Amet', '456456']

I'm a newbie in Python and I dont know if such thing is possible or not. Any help will be very much appreciated. Thanks!

like image 303
randms26 Avatar asked Oct 16 '22 10:10

randms26


2 Answers

Columns have fixed widths so you can use it and slice lines

data = '''FruitName      Owner             OwnerPhone
=============  ================= ============
Red Apple      Sr Lorem Ipsum    123123
Yellow Banana  Ms Dolor sir Amet 456456'''

lines = data.split('\n')

for line in lines[2:]:
    fruit = line[:13].strip()
    owner = line[13:32].strip()
    phone = line[32:].strip()
    print([fruit, owner, phone])

More complex solution would use second line - with === - to calculate widths for columns and use them in slicing.

like image 113
furas Avatar answered Nov 15 '22 10:11

furas


You can use the ==== dividers to your advantage in that you can get slices in all lines corresponding to the start and end indices of each ==== that represents a column:

def get_divider_indices(line):
  i, j = 0, line.index(' ')
  indices = []
  while i != -1:
    indices.append((i, j))
    i = line.find('=', j)
    j = line.find(' ', i)
    if j == -1: j = len(line)
  return indices

with open('data.txt', 'r') as f:
  lines = f.readlines()
  dividers = get_divider_indices(lines[1])
  rows= []
  for line in lines[2:]:
    rows.append([line[s:e].strip() for s, e in dividers])

print(rows)

Output

[['Red Apple', 'Sr Lorem Ipsum', '123123'], ['Yellow Banana', 'Ms Dolor sir Amet', '456456']]

Note that you can use str.find() to get the index of a character in a string (which I use above to get the index of an = or a space in the divider line).

like image 26
slider Avatar answered Nov 15 '22 09:11

slider