I am attempting to use software supplied by Kimonolabs to get a list of list of doctors from a web site. The problem I am having is that a string I have scraped from the web site has an address and a zip code that is separated by a <br> tag.
Kimono uses this syntax for a regex:
/^()(.*?)()$/
first group => to the left of the required content
second group => this is what should get extracted
third group => to the right of the required content
Specifically, here are the regex expressions that I have tried:
/^()(.*?)(\<)$/
/^()(.*?)(\n)$/
/^()(.*?)(\r)$/
And, this is the site I'm trying to scrape: http://www.jameda.de/
Here's an example line that I am trying to parse via a regex:
<p>Altlaufstr. 22<br>85635 Höhenkirchen-Siegertbrn</p>
However, each of the regex patterns that I have tried do not capture any data. I am having trouble understanding regexes because I am finding that the reference materials I have found are pretty complicated.
It seems like you are trying to match German zipcodes, which are always 5 digits. This will do it:
/(<br\/?>)(\d{5})()/
Breakdown:
<br\/?> indicates that it must be preceded by a <br> tag (with or without slash)
\d{5} is 5 digits
Note: leave out the ^ and $ anchors that were in the default kimono regex because this regex is not trying to match the entire text - just the ZIP.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With