Regex Processing using software supplied by Kimonolabs

Question

I am attempting to use software supplied by Kimonolabs to get a list of list of doctors from a web site. The problem I am having is that a string I have scraped from the web site has an address and a zip code that is separated by a <br> tag.

Kimono uses this syntax for a regex:

/^()(.*?)()$/

first group => to the left of the required content

second group => this is what should get extracted

third group => to the right of the required content

Specifically, here are the regex expressions that I have tried:

/^()(.*?)(\<)$/ 
/^()(.*?)(
)$/
/^()(.*?)(
)$/

And, this is the site I'm trying to scrape: http://www.jameda.de/

Here's an example line that I am trying to parse via a regex:

<p>Altlaufstr. 22<br>85635 Höhenkirchen-Siegertbrn</p>

However, each of the regex patterns that I have tried do not capture any data. I am having trouble understanding regexes because I am finding that the reference materials I have found are pretty complicated.

Brian Stephens · Accepted Answer

It seems like you are trying to match German zipcodes, which are always 5 digits. This will do it:

/(<br\/?>)(\d{5})()/

Breakdown:

<br\/?> indicates that it must be preceded by a <br> tag (with or without slash)

\d{5} is 5 digits

Note: leave out the ^ and $ anchors that were in the default kimono regex because this regex is not trying to match the entire text - just the ZIP.

Regex Processing using software supplied by Kimonolabs

Tags:

regex

web-scraping

Andi Giga

1 Answers

Brian Stephens

Recent Activity

Donate For Us

Regex Processing using software supplied by Kimonolabs

Tags:

regex

web-scraping

Andi Giga

1 Answers

Brian Stephens

Related questions

Recent Activity

Donate For Us