Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex Processing using software supplied by Kimonolabs

I am attempting to use software supplied by Kimonolabs to get a list of list of doctors from a web site. The problem I am having is that a string I have scraped from the web site has an address and a zip code that is separated by a <br> tag.

Kimono uses this syntax for a regex:

/^()(.*?)()$/ 

first group => to the left of the required content

second group => this is what should get extracted

third group => to the right of the required content

Specifically, here are the regex expressions that I have tried:

/^()(.*?)(\<)$/ 
/^()(.*?)(\n)$/
/^()(.*?)(\r)$/

And, this is the site I'm trying to scrape: http://www.jameda.de/

Here's an example line that I am trying to parse via a regex:

<p>Altlaufstr. 22<br>85635 Höhenkirchen-Siegertbrn</p>

However, each of the regex patterns that I have tried do not capture any data. I am having trouble understanding regexes because I am finding that the reference materials I have found are pretty complicated.

like image 698
Andi Giga Avatar asked May 08 '26 09:05

Andi Giga


1 Answers

It seems like you are trying to match German zipcodes, which are always 5 digits. This will do it:

/(<br\/?>)(\d{5})()/

Breakdown:

<br\/?> indicates that it must be preceded by a <br> tag (with or without slash)

\d{5} is 5 digits

Note: leave out the ^ and $ anchors that were in the default kimono regex because this regex is not trying to match the entire text - just the ZIP.

like image 86
Brian Stephens Avatar answered May 11 '26 00:05

Brian Stephens