I have a (very ugly) txt output from an SQL query which is performed by external system that I can't change. Here is the output example:
FruitName Owner OwnerPhone
============= ================= ============
Red Apple Sr Lorem Ipsum 123123
Yellow Banana Ms Dolor sir Amet 456456
As you can see, the FruitName
column and the Owner
column may consists of few words and there's no fixed pattern in how many words could be in these columns. If I use line.split()
to make array on each line Python, it will remove all the whitespace and make the array become like this:
['Red', 'Apple', 'Sr', 'Lorem', 'Ipsum', '123123']
['Yellow', 'Banana', 'Ms', 'Dolor', 'sir', 'Amet', '456456']
The question is, how can I split it properly into output like this:
['Red Apple', 'Sr Lorem Ipsum', '123123']
['Yellow Banana', 'Ms Dolor sir Amet', '456456']
I'm a newbie in Python and I dont know if such thing is possible or not. Any help will be very much appreciated. Thanks!
Columns have fixed widths so you can use it and slice lines
data = '''FruitName Owner OwnerPhone
============= ================= ============
Red Apple Sr Lorem Ipsum 123123
Yellow Banana Ms Dolor sir Amet 456456'''
lines = data.split('\n')
for line in lines[2:]:
fruit = line[:13].strip()
owner = line[13:32].strip()
phone = line[32:].strip()
print([fruit, owner, phone])
More complex solution would use second line - with ===
- to calculate widths for columns and use them in slicing.
You can use the ====
dividers to your advantage in that you can get slices in all lines corresponding to the start and end indices of each ====
that represents a column:
def get_divider_indices(line):
i, j = 0, line.index(' ')
indices = []
while i != -1:
indices.append((i, j))
i = line.find('=', j)
j = line.find(' ', i)
if j == -1: j = len(line)
return indices
with open('data.txt', 'r') as f:
lines = f.readlines()
dividers = get_divider_indices(lines[1])
rows= []
for line in lines[2:]:
rows.append([line[s:e].strip() for s, e in dividers])
print(rows)
Output
[['Red Apple', 'Sr Lorem Ipsum', '123123'], ['Yellow Banana', 'Ms Dolor sir Amet', '456456']]
Note that you can use str.find()
to get the index of a character in a string (which I use above to get the index of an =
or a space in the divider line).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With