How to Python split by a character yet maintain that character?

Question

Google Maps results are often displayed thus:

enter image description here

'
113 W 5th St
Eureka, MO, United States
(636) 938-9310
'

Another variation:

enter image description here

'Clayton Village Shopping Center, 14856 Clayton Rd
Chesterfield, MO, United States
(636) 227-2844'

And another:

enter image description here

'Wildwood, MO
United States
(636) 458-7707'

Notice the variation in the placement of the characters.

I'm looking to extract the first X lines as address, and the last line as phone number. A regex such as (.* .*) (.*) would suffice for the first example, but falls short for the other two. The only thing I can rely on is that the phone number will be in the form (ddd) ddd-dddd.

I think a regex that will allow for each and every possible variation will be hard to come by. Is it possible to use split(), but maintain the character we have split by? So in this example, split by "(", to split out the address and phone number, but retain this character in the phone number? I could concatenate the "(" back into split("(")[1], but is there a neater way?

ArtOfWarfare · Accepted Answer

Don't use regex. Just split the string on the ' '. The last index is a phone number, the other indexes are the address.

lines   = inputString.split('
')
phone   = lines[-1] if lines[-1].match(REGEX_PHONE_US) else None
address = '
'.join(lines[:-1]) if phone else inputString

Python has a lot of great built in tools for manipulating strings in a more... human way... than regex allows.

Joe T. Boka · Answer

If I understand you correctly, you want to "extract the first X lines as address". Assuming that all the addresses you need are in the US this regex code should work for you. In any case, it works on the 3 examples you provided:

import re
x = 'Wildwood, MO
United States
(636) 458-7707'
print re.findall(r'.*
+.*\States', x)

The output is:

['Wildwood, MO
United States']

If you want to print it later without the you can do it this way:

x = '
113 W 5th St
Eureka, MO, United States
(636) 938-9310
'
y = re.findall(r'.*
+.*\States', x)
y = y[0].rstrip()

When you print y the output:

113 W 5th St
Eureka, MO, United States

And, if you want to extract the phone number separately you can do this:

tel = '
113 W 5th St
Eureka, MO, United States
(636) 938-9310
'
num = re.findall(r'.*\d+\-\d+', tel)
num = num[0].rstrip()

When you print num the output:

(636) 938-9310

How to Python split by a character yet maintain that character?

Tags:

python

regex

split

newline

python-2.7

Pyderman

2 Answers

ArtOfWarfare

Joe T. Boka

Recent Activity

Donate For Us

How to Python split by a character yet maintain that character?

Tags:

python

regex

split

newline

python-2.7

Pyderman

2 Answers

ArtOfWarfare

Joe T. Boka

Related questions

Recent Activity

Donate For Us