Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storing the first word in each line of a string into a list

I have a string containing multiple lines. Each line is broken by '\n' and contains commas after each word. I want to store the first word in each line into a list.

Here is the string output:

AIG,10,,,,Yes,,,Jr,,,MS,,
Baylor College of Medicine,19,Yes,Yes,,,,,,,,,,Recent
CGG,17,Yes,Yes,,,,,,,,MS,PhD,Recent
Citi,27/28,Yes,,,Yes,,,Jr,Sr,,,,

My list should be ['AIG', 'Baylor College of Medicine', 'CGG', 'Citi']

I thought about using split after the first comma and then go to the next line, but I do not know how I can achieve this.


My Solution was to go back in my code and search for a list of "companies" I previously made.

companies =

['AIG,10,,,,Yes,,,Jr,,,MS,,\n', 'Baylor\xa0College\xa0of\xa0Medicine,19,Yes,Yes,,,,,,,,,,Recent\n', 'CGG,17,Yes,Yes,,,,,,,,MS,PhD,Recent\n', 'Citi,27/28,Yes,,,Yes,,,Jr,Sr,,,,\n', 'ExxonMobil,11,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,,,\n', 'Flow-Cal\xa0Inc.,16,Yes,,,Yes,,,Jr,Sr,,,,All\n', 'Global\xa0Shop\xa0Solutions,18,Yes,,,Yes,,,,Sr,PB,,,All\n']

cmpny_name = []
for i  in companies:
    cmpny_name.append(i.split(',', 1)[0])

cmpny_name = [c.replace('\xa0', ' ') for c in cmpny_name]
print(cmpny_name)

OUTPUT:['AIG', 'Baylor College of Medicine', 'CGG', 'Citi', 'ExxonMobil', 'Flow-Cal Inc.', 'Global Shop Solutions', 'Harris County CTS', 'HCSS', 'Hitachi Consulting', 'HP Inc.', 'INT Inc.']
like image 826
CTG713 Avatar asked Dec 23 '22 08:12

CTG713


2 Answers

I would use split two times:

lines = string.split('\n')
output = [line.split(',')[0] for line in lines]
like image 84
Aemyl Avatar answered Mar 06 '23 00:03

Aemyl


I wold slightly simplify @Amely's answer

from pprint import pprint
a="this is line 1\nthat is line 2\nthose are line3\nbill was here\nbob was here"
first = [line.split(' ')[0] for line in a.split('\n')]
pprint(first)

And you will get the first words in each line

['this', 'that', 'those', 'bill', 'bob']
like image 45
Tim Seed Avatar answered Mar 06 '23 00:03

Tim Seed