I have a document that when converted to text splits the phone number onto multiple lines like this:
(xxx)-xxx-
xxxx
For a variety of reasons related to my project I can't simply join the lines.
If I know the phonenumber="(555)-555-5555" how can I compile a regex so that if I run it over
(555)-555-
5555
it will match?
**EDIT
To help clarify my question here it is in a more abstract form.
test_string = "xxxx xx x xxxx"
text = """xxxx xx
x
xxxx"""
I need the test string to be found in the text. Newlines can be anywhere in the text and characters that need to be escaped should be taken into consideration.
A simple workaround would be to replace all the \n characters in the document text before you search it:
pat = re.compile(r'\(\d{3}\)-\d{3}\d{4}')
numbers = pat.findall(text.replace('\n',''))
# ['(555)-555-5555']
If this cannot be done for any reasons, the obvious answer, though unsightly, would be to handle a newline character between each search character:
pat = re.compile(r'\(\n*5\n*5\n*5\n*\)\n*-\n*5\n*5\n*5\n*-\n*5\n*5\n*5\n*5')
If you needed to handle any format, you can pad the format like so:
phonenumber = '(555)-555-5555'
pat = re.compile('\n*'.join(['\\'+i if not i.isalnum() else i for i in phonenumber]))
# pat
# re.compile(r'\(\n*5\n*5\n*5\n*\)\n*\-\n*5\n*5\n*5\n*\-\n*5\n*5\n*5\n*5', re.UNICODE)
Test case:
import random
def rndinsert(s):
i = random.randrange(len(s)-1)
return s[:i] + '\n' + s[i:]
for i in range(10):
print(pat.findall(rndinsert('abc (555)-555-5555 def')))
# ['(555)-555-5555']
# ['(555)-5\n55-5555']
# ['(555)-5\n55-5555']
# ['(555)-555-5555']
# ['(555\n)-555-5555']
# ['(5\n55)-555-5555']
# ['(555)\n-555-5555']
# ['(555)-\n555-5555']
# ['(\n555)-555-5555']
# ['(555)-555-555\n5']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With