I'm trying to split a string into multiple strings. I'm using the re library but i ran into a problem. Say my string is" "Yakima, WA[4660,12051]49826" I it will work if I do this:
>>> import re
>>> x = "Yakima, WA[4660,12051]49826"
>>> re.split('\W+', x)
it would return
['Yakima', 'WA', '4660', '12051', '49826']
Which I would want it to. But the problem I'm having is if the city has a (-) or a space in the city, how would be able to keep the city together in its own string. I"ll be dealing with multiple cities, and some have 2 to 3 work cities and some have dashes. I need to keep 3 data structures with the city and state combined, the coordinates, and the population.
>>> x = "Winston-Salem, NC[3610,8025]131885"
>>> re.split('\W+', x)
['Winston', 'Salem', 'NC', '3610', '8025', '131885']
or
>>> x = "West Palm Beach, FL[2672,8005]63305"
>>> re.split('\W+', x)
['West', 'Palm', 'Beach', 'FL', '2672', '8005', '63305']
and I want:
['Winston-Salem', 'NC', '3610', '8025', '131885']
['West Palm Beach', 'FL', '2672', '8005', '63305']
You can split by [^\w\s-]+:
>>> x = "Winston-Salem, NC[3610,8025]131885"
>>> re.split('[^\w\s-]+', x)
['Winston-Salem', ' NC', '3610', '8025', '131885']
>>> x = "West Palm Beach, FL[2672,8005]63305"
>>> re.split('[^\w\s-]+', x)
['West Palm Beach', ' FL', '2672', '8005', '63305']
[^\w\s-]+ basically means not alphanumeric (a-zA-Z0-9_), not whitespace character and not -.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With