Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split strings multiple words or dashes

Tags:

python

regex

I'm trying to split a string into multiple strings. I'm using the re library but i ran into a problem. Say my string is" "Yakima, WA[4660,12051]49826" I it will work if I do this:

>>> import re
>>> x = "Yakima, WA[4660,12051]49826"
>>> re.split('\W+', x)

it would return

['Yakima', 'WA', '4660', '12051', '49826']

Which I would want it to. But the problem I'm having is if the city has a (-) or a space in the city, how would be able to keep the city together in its own string. I"ll be dealing with multiple cities, and some have 2 to 3 work cities and some have dashes. I need to keep 3 data structures with the city and state combined, the coordinates, and the population.

>>> x = "Winston-Salem, NC[3610,8025]131885"
>>> re.split('\W+', x)
['Winston', 'Salem', 'NC', '3610', '8025', '131885']

or

>>> x = "West Palm Beach, FL[2672,8005]63305"
>>> re.split('\W+', x)
['West', 'Palm', 'Beach', 'FL', '2672', '8005', '63305']

and I want:

['Winston-Salem', 'NC', '3610', '8025', '131885']
['West Palm Beach', 'FL', '2672', '8005', '63305']
like image 588
user2958457 Avatar asked Jun 19 '26 04:06

user2958457


1 Answers

You can split by [^\w\s-]+:

>>> x = "Winston-Salem, NC[3610,8025]131885"
>>> re.split('[^\w\s-]+', x)
['Winston-Salem', ' NC', '3610', '8025', '131885']

>>> x = "West Palm Beach, FL[2672,8005]63305"
>>> re.split('[^\w\s-]+', x)
['West Palm Beach', ' FL', '2672', '8005', '63305']

[^\w\s-]+ basically means not alphanumeric (a-zA-Z0-9_), not whitespace character and not -.

like image 66
alecxe Avatar answered Jun 21 '26 16:06

alecxe



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!