<p>I want to extract IBAN numbers from text with Python. The challenge here is, that the IBAN itself can be written in so many ways with spaces bewteen the numbers, that I find it difficult to translate this in a usefull regex pattern.</p> <p>I have written a demo version which tries to match all German and Austrian IBAN numbers from text.</p> <pre class="prettyprint"><code>^DE([0-9a-zA-Z]\s?){20}$ </code></pre> <p>I have seen similar questions on stackoverflow. However, the combination of different ways to write IBAN numbers and also extracting these numbers from text, makes it very difficult to solve my problem.</p> <p>Hope you can help me with that!</p>

<div class="s-table-container"> <table class="s-table"> <thead><tr> <th></th> <th>ISO landcode</th> <th>Verification#</th> <th>Bank#</th> <th>Account#</th> </tr></thead> <tbody> <tr> <td>Germany</td> <td>2a</td> <td>2n</td> <td>8n</td> <td>10n</td> </tr> <tr> <td>Austria</td> <td>2a</td> <td>2n</td> <td>5n</td> <td>11n</td> </tr> </tbody> </table> </div> <p><sub><strong>Note:</strong> a - alphabets (letters only), n - numbers (numbers only)</sub></p> <p>So the main difference is really the length in digits. That means you could try:</p> <pre class="prettyprint"><code>\b(?:DE(?:\s*\d){20}|AT(?:\s*\d){18})\b(?!\s*\d) </code></pre> <p>See the online demo.</p> <hr> <ul> <li> <code>\b</code> - Word-boundary.</li> <li> <code>(?:</code> - Open 1st non-capturing group. <ul> <li> <code>DE</code> - Match uppercase "DE" literally.</li> <li> <code>(?:</code>- Open 2nd non-capturing group. <ul> <li> <code>\s*\d</code> - Zero or more spaces upto a single digit.</li> <li> <code>){20}</code> - Close 2nd non-capturing group and match it 20 times.</li> </ul> </li> <li> <code>|</code> - Or:</li> <li> <code>AT</code> - Match uppercase "AT" literally.</li> <li> <code>(?:</code>- Open 3rd non-capturing group. <ul> <li> <code>\s*\d</code> - Zero or more spaces upto a single digit.</li> <li> <code>){18}</code> - Close 2nd non-capturing group and match it 20 times.</li> </ul> </li> <li> <code>)</code> - Close 1st non-capturing group.</li> </ul> </li> <li> <code>\b</code> - Word-boundary.</li> <li> <code>(?!\s*\d)</code> - Negative lookahead to prevent any trailing digits.</li> </ul> <hr> <p>It does show that your Austrian IBAN numbers are invalid. If you wish to extract up to the point where they would still be valid, I guess you can remove <code>\b(?!\s*\d)</code></p>

<p>In general, <strong>to match German and Austrian IBAN codes</strong>, you can use</p> <pre class="prettyprint lang-none prettyprint-override"><code>codes = re.findall(r'\b(DE(?:\s*[0-9]){20}|AT(?:\s*[0-9]){18})\b(?!\s*[0-9])', text) </code></pre> <p><strong>Details</strong>:</p> <ul> <li> <code>\b</code> - word boundary</li> <li> <code>(DE(?:\s*[0-9]){20}|AT(?:\s*[0-9]){18})</code> - Group 1: <code>DE</code> and 20 repetitions of a digit with any amount of whitespace in between, or <code>AT</code> and then 18 repetitions of single digits eventaully separated with any amount of whitespaces</li> <li> <code>\b(?!\s*[0-9])</code> - word boundary that is NOT immediately followed with zero or more whitespaces and an ASCII digit.</li> </ul> <p>See this regex demo.</p> <p><strong>For the data you showed in the question that includes non-proper IBAN codes</strong>, you can use</p> <pre class="prettyprint lang-py prettyprint-override"><code>\b(?:DE|AT)(?:\s?[0-9a-zA-Z]){18}(?:(?:\s?[0-9a-zA-Z]){2})?\b </code></pre> <p>See the regex demo. <em>Details</em>:</p> <ul> <li> <code>\b</code> - word boundary</li> <li> <code>(?:DE|AT)</code> - <code>DE</code> or <code>AT</code> </li> <li> <code>(?:\s?[0-9a-zA-Z]){18}</code> - eighteen occurrences of an optional whitespace and then an alphanumeric char</li> <li> <code>(?:(?:\s?[0-9a-zA-Z]){2})?</code> - an optional occurrence of two sequences of an optional whitespace and an alphanumeric char</li> <li> <code>\b</code> - word boundary.</li> </ul>

Extract IBAN from text with Python

Tags:

python

regex

pattern-matching

iban

I want to extract IBAN numbers from text with Python. The challenge here is, that the IBAN itself can be written in so many ways with spaces bewteen the numbers, that I find it difficult to translate this in a usefull regex pattern.

I have written a demo version which tries to match all German and Austrian IBAN numbers from text.

^DE([0-9a-zA-Z]\s?){20}$

I have seen similar questions on stackoverflow. However, the combination of different ways to write IBAN numbers and also extracting these numbers from text, makes it very difficult to solve my problem.

Hope you can help me with that!

547

asked Jan 15 '21 11:01

PParker

2 Answers

	ISO landcode	Verification#	Bank#	Account#
Germany	2a	2n	8n	10n
Austria	2a	2n	5n	11n

_{Note: a - alphabets (letters only), n - numbers (numbers only)}

So the main difference is really the length in digits. That means you could try:

\b(?:DE(?:\s*\d){20}|AT(?:\s*\d){18})\b(?!\s*\d)

See the online demo.

\b - Word-boundary.
(?: - Open 1st non-capturing group.
- DE - Match uppercase "DE" literally.
- (?:- Open 2nd non-capturing group.
  - \s*\d - Zero or more spaces upto a single digit.
  - ){20} - Close 2nd non-capturing group and match it 20 times.
- | - Or:
- AT - Match uppercase "AT" literally.
- (?:- Open 3rd non-capturing group.
  - \s*\d - Zero or more spaces upto a single digit.
  - ){18} - Close 2nd non-capturing group and match it 20 times.
- ) - Close 1st non-capturing group.
\b - Word-boundary.
(?!\s*\d) - Negative lookahead to prevent any trailing digits.

It does show that your Austrian IBAN numbers are invalid. If you wish to extract up to the point where they would still be valid, I guess you can remove \b(?!\s*\d)

148

answered Oct 04 '22 17:10

JvdV

In general, to match German and Austrian IBAN codes, you can use

codes = re.findall(r'\b(DE(?:\s*[0-9]){20}|AT(?:\s*[0-9]){18})\b(?!\s*[0-9])', text)

Details:

\b - word boundary
(DE(?:\s*[0-9]){20}|AT(?:\s*[0-9]){18}) - Group 1: DE and 20 repetitions of a digit with any amount of whitespace in between, or AT and then 18 repetitions of single digits eventaully separated with any amount of whitespaces
\b(?!\s*[0-9]) - word boundary that is NOT immediately followed with zero or more whitespaces and an ASCII digit.

See this regex demo.

For the data you showed in the question that includes non-proper IBAN codes, you can use

\b(?:DE|AT)(?:\s?[0-9a-zA-Z]){18}(?:(?:\s?[0-9a-zA-Z]){2})?\b

See the regex demo. Details:

\b - word boundary
(?:DE|AT) - DE or AT
(?:\s?[0-9a-zA-Z]){18} - eighteen occurrences of an optional whitespace and then an alphanumeric char
(?:(?:\s?[0-9a-zA-Z]){2})? - an optional occurrence of two sequences of an optional whitespace and an alphanumeric char
\b - word boundary.

answered Oct 04 '22 16:10

Wiktor Stribiżew

Related questions
                            
                                Plotting networkx.Graph: how to change node position instead of resetting every node?
                            
                                What is the correct boilerplate for explicit relative imports?
                            
                                Python concurrent.futures Error in atexit._run_exitfuncs: OSError: handle is closed only running in Visual studio Debugging Mode
                            
                                Scrapy hidden memory leak
                            
                                How to convert a dataframe from long to wide, with values grouped by year in the index?
                            
                                How to specify external system dependencies to a Python package?
                            
                                creating a json object from pandas dataframe
                            
                                Decrypting AES CBC in python from OpenSSL AES
                            
                                Regular expression to find a sequence of numbers before multiple patterns, into a new column (Python, Pandas)
                            
                                How to display two figures, side by side, in a Jupyter cell
                            
                                Early stopping with multiple conditions
                            
                                how to solve bug on snake wall teleportation
                            
                                How does joblib.Parallel deal with global variables?
                            
                                Make helix from two objects
                            
                                How to have pandas perform a rolling average on a non-uniform x-grid
                            
                                How to quickly check if domain exists? [duplicate]
                            
                                How to make a module reload in python after the script is compiled?
                            
                                How do I properly import python modules in a multi directory project?
                            
                                Running two dask-ml imputers simultaneously instead of sequentially
                            
                                Edit Terraform configuration files programmatically with Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With