I am writing a python library to parse different working hours string and produce the standard format of hours. I stuck in the following case:
My regex should return groups for Mon - Fri 7am - 5pm Sat 9am - 3pm as ['Mon - Fri 7am - 5pm ', 'Sat 9am - 3pm'] but if there is a comma between first and second then it should return [].
Also the comma can be in anywhere but should not between the two weekdays & duration. eg: Mon - Fri 7am - 5pm Sat 9am - 3pm and available upon email, phone call should return ['Mon - Fri 7am - 5pm ', 'Sat 9am - 3pm'].
This is what I have tried,
import re
pattern = """(
(?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|m|w|f|thurs) # Start weekday
\s*[-|to]+\s* # Seperator
(?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|^(?![ap])m|w|f|thurs)? # End weekday
\s*[from]*\s* # Seperator
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?) # Start hour
\s*[-|to]+\s* # Seperator
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?) # Close hour
)"""
regEx = re.compile(pattern, re.IGNORECASE|re.VERBOSE)
print re.findall(regEx, "Mon - Fri 7am - 5pm Sat 9am - 3pm")
# output ['Mon - Fri 7am - 5pm ', 'Sat 9am - 3pm']
print re.findall(regEx, "Mon - Fri 7am - 5pm Sat - Sun 9am - 3pm")
# output ['Mon - Fri 7am - 5pm ', 'Sat - Sun 9am - 3pm']
print re.findall(regEx, "Mon - Fri 7am - 5pm, Sat 9am - 3pm")
# expected output []
# but I get ['Mon - Fri 7am - 5pm,', 'Sat 9am - 3pm']
print re.findall(regEx, "Mon - Fri 7am - 5pm , Sat 9am - 3pm")
# expected output []
# but I get ['Mon - Fri 7am - 5pm ', 'Sat 9am - 3pm']
Also I tried negative look ahead pattern in my regex
pattern = """(
(?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|m|w|f|thurs)
\s*[-|to]+\s*
(?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|^(?![ap])m|w|f|thurs)?
\s*[from]*\s*
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?)
\s*[-|to]+\s*
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?)
(?![^,])
)"""
But I didnt get expected one. Should I explicitly write code for checking condition? Is there any way to just changing my regex instead of writing explicit condition checking?
Another way I like to implement is infix the comma between two weekday- duration if comma doesn't exist and change my regex to group by/split by comma. "Mon - Fri 7am - 5pm Sat 9am - 3pm" => "Mon - Fri 7am - 5pm, Sat 9am - 3pm"
I think that you can doing it simply by matching the whole expression so that comma (and other characters are not allowed :
pattern = """^(
(
(?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|m|w|f|thurs) # Start weekday
\s*[-|to]+\s* # Seperator
(?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|^(?![ap])m|w|f|thurs)? # End weekday
\s*[from]*\s* # Seperator
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?) # Start hour
\s*[-|to]+\s* # Seperator
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?) # Close hour
)
)+$""
This will output :
[('Sat 9am - 3pm', 'Sat 9am - 3pm')]
[('Sat - Sun 9am - 3pm', 'Sat - Sun 9am - 3pm')]
[]
[]
Hope it helps,
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With