i am a total Regex Noob and spent hours trying to solve this puzzle. I think I have to use some kind of optional non-capturing groups or alternation. I want to match the following strings: <blockquote> <ol> <li>Neuer Film a von 1000</li> <li>Neuer Film a von 1000 mit b</li> <li>Neuer Film a von 1000 mit b und c</li> <li>Neuer Film a von 1000 mit b und c und d</li> <li>Neuer Film a mit b</li> <li>Neuer Film a mit b und c</li> <li>Neuer Film a mit b und c und d</li> </ol> </blockquote> My regex looks like this: <pre class="prettyprint"><code>var regex = /(?:[nN]euer [Ff]ilm\s?)(.*)(?:[vV]on).(\d{4}).(?:[Mm]it)(.*)(?:[uU]nd)(.*)/g; </code></pre> The problem is it matches only string 3 and 4. And it does not match the last two "und", but packs it in group No.3 not in group No.4. Can someone please help with my Regex (which is not very user friendly at all ;)

You really need to use non-capturing optional groups (like <code>(?:...)?</code>), but besides, you also need anchors (<code>^</code> to match the start of the string and <code>$</code> to match the string end) and lazy dot matching patterns (<code>.*?</code>, to match as few any chars as possible). You may use <pre class="prettyprint"><code>/^[nN]euer [Ff]ilm\s*(.*?)(?:\s*[vV]on\s+(\d{4}))?(?:\s+[Mm]it\s*(.*?)(?:\s*[uU]nd\s*(.*))?)?$/ </code></pre> See the regex demo. In the demo, <code>/gm</code> modifiers are necessary since the input is a multiline string. Pattern details: <ul> <li> <code>^</code> - start of a string anchor</li> <li> <code>[nN]euer [Ff]ilm</code> - <code>Neuer film</code> / <code>Neuer Film</code> / <code>neuer Film</code> </li> <li> <code>\s*</code> - zero or more whitespaces</li> <li> <code>(.*?)</code> - Group 1: any 0+ chars other than line break chars, as few as possible (that is, up to the leftmost occurrence of the subsequent subpatterns)</li> <li> <code>(?:\s*[vV]on\s+(\d{4}))?</code> - 1 or 0 occurrences of: <ul> <li> <code>\s*</code> - 0+ whitespaces</li> <li> <code>[vV]on</code> - <code>von</code> or <code>Von</code> </li> <li> <code>\s+</code> - 1+ whitespaces</li> <li> <code>(\d{4})</code> - Group 2: 4 digits</li> </ul> </li> <li> <code>(?:\s+[Mm]it\s*(.*?)(?:\s*[uU]nd\s*(.*))?)?</code> - an optional non-capturing group matching 1 or 0 occurrences of: <ul> <li> <code>\s+</code> - 1+ whitespaces</li> <li> <code>[Mm]it</code> - <code>Mit</code> or <code>mit</code> </li> <li> <code>\s*</code> - 0+ whitespaces</li> <li> <code>(.*?)</code> - Group 3 matching any 0+ chars other than line break chars, as few as possible </li> <li> <code>(?:\s*[uU]nd\s*(.*))?</code> - an optional non-capturing group matching <ul> <li> <code>\s*[uU]nd\s*</code> - <code>und</code> or <code>Und</code> enclosed with 0+ whitespaces</li> <li> <code>(.*)</code> - Group 4 matching any 0+ chars other than line break chars, as many as possible </li> </ul> </li> </ul> </li> <li> <code>$</code> - end of string.</li> </ul> <div class="snippet" data-lang="js" data-hide="false" data-console="true" data-babel="false"> <div class="snippet-code"> <pre class="prettyprint snippet-code-js lang-js prettyprint-override"><code>var strs = ['Neuer Film a von 1000','Neuer Film a von 1000 mit b','Neuer Film a von 1000 mit b und c','Neuer Film a von 1000 mit b und c und d','Neuer Film a mit b','Neuer Film a mit b und c','Neuer Film a mit b und c und d']; var rx = /^[nN]euer [Ff]ilm\s*(.*?)(?:\s*[vV]on\s+(\d{4}))?(?:\s+[Mm]it\s*(.*?)(?:\s*[uU]nd\s*(.*))?)?$/; for (var s of strs) { var m = rx.exec(s); if (m) { console.log('-- ' + s + ' ---'); console.log('Group 1: ' + m[1]); if (m[2]) console.log('Group 2: ' + m[2]); if (m[3]) console.log('Group 3: ' + m[3]); if (m[4]) console.log('Group 4: ' + m[4]); } }</code></pre> </div> </div>

Regex optional non-capturing groups

Tags:

javascript

regex

i am a total Regex Noob and spent hours trying to solve this puzzle. I think I have to use some kind of optional non-capturing groups or alternation.

I want to match the following strings:

Neuer Film a von 1000

Neuer Film a von 1000 mit b

Neuer Film a von 1000 mit b und c

Neuer Film a von 1000 mit b und c und d

Neuer Film a mit b

Neuer Film a mit b und c

Neuer Film a mit b und c und d

My regex looks like this:

var regex = /(?:[nN]euer [Ff]ilm\s?)(.*)(?:[vV]on).(\d{4}).(?:[Mm]it)(.*)(?:[uU]nd)(.*)/g;

The problem is it matches only string 3 and 4. And it does not match the last two "und", but packs it in group No.3 not in group No.4.

Can someone please help with my Regex (which is not very user friendly at all ;)

762

asked Apr 11 '17 19:04

TrantSteel

1 Answers

You really need to use non-capturing optional groups (like (?:...)?), but besides, you also need anchors (^ to match the start of the string and $ to match the string end) and lazy dot matching patterns (.*?, to match as few any chars as possible).

You may use

/^[nN]euer [Ff]ilm\s*(.*?)(?:\s*[vV]on\s+(\d{4}))?(?:\s+[Mm]it\s*(.*?)(?:\s*[uU]nd\s*(.*))?)?$/

See the regex demo. In the demo, /gm modifiers are necessary since the input is a multiline string.

Pattern details:

^ - start of a string anchor
[nN]euer [Ff]ilm - Neuer film / Neuer Film / neuer Film
\s* - zero or more whitespaces
(.*?) - Group 1: any 0+ chars other than line break chars, as few as possible (that is, up to the leftmost occurrence of the subsequent subpatterns)
(?:\s*[vV]on\s+(\d{4}))? - 1 or 0 occurrences of:
- \s* - 0+ whitespaces
- [vV]on - von or Von
- \s+ - 1+ whitespaces
- (\d{4}) - Group 2: 4 digits
(?:\s+[Mm]it\s*(.*?)(?:\s*[uU]nd\s*(.*))?)? - an optional non-capturing group matching 1 or 0 occurrences of:
- \s+ - 1+ whitespaces
- [Mm]it - Mit or mit
- \s* - 0+ whitespaces
- (.*?) - Group 3 matching any 0+ chars other than line break chars, as few as possible
- (?:\s*[uU]nd\s*(.*))? - an optional non-capturing group matching
  - \s*[uU]nd\s* - und or Und enclosed with 0+ whitespaces
  - (.*) - Group 4 matching any 0+ chars other than line break chars, as many as possible
$ - end of string.

var strs = ['Neuer Film a von 1000','Neuer Film a von 1000 mit b','Neuer Film a von 1000 mit b und c','Neuer Film a von 1000 mit b und c und d','Neuer Film a mit b','Neuer Film a mit b und c','Neuer Film a mit b und c und d'];
var rx = /^[nN]euer [Ff]ilm\s*(.*?)(?:\s*[vV]on\s+(\d{4}))?(?:\s+[Mm]it\s*(.*?)(?:\s*[uU]nd\s*(.*))?)?$/;
for (var s of strs) {
   var m = rx.exec(s);
   if (m) {
     console.log('-- ' + s + ' ---');
     console.log('Group 1: ' + m[1]);
     if (m[2]) console.log('Group 2: ' + m[2]);
     if (m[3]) console.log('Group 3: ' + m[3]);
     if (m[4]) console.log('Group 4: ' + m[4]);
   }
   
}

156

answered Oct 19 '22 00:10

Wiktor Stribiżew

Related questions
                            
                                Does V8 cache compiled regular expressions automatically?
                            
                                Using RequireJS alongside non-AMD JavaScript files
                            
                                Is it necessary to set onload function before setting src for an image object?
                            
                                How to call node.js server side method from javascript?
                            
                                Use d3 log scale instead of linear scale
                            
                                WebGL drawElements out of range?
                            
                                Is there a case where "[^xy]" is not equal to "(?!x|y)."?
                            
                                Get actual width and height of a component
                            
                                regex used in javascript array indexOf
                            
                                Angular routing doesn't work when URL is directly visited in browser but works when clicked to?
                            
                                How to send binary data from a Node.js socket.io server to a browser client?
                            
                                moment.js moment is undefined
                            
                                Masking a social security number input
                            
                                Open (Import) file in a chrome extension
                            
                                WebSocket handshake: Unexpected response code: 404
                            
                                react-router Uncaught TypeError: Cannot read property 'toUpperCase' of undefined
                            
                                Facebook's "See first" subscription through API / plugin / trick?
                            
                                requestFullscreen in Chrome: image remains small
                            
                                How to solve 'window is not defined' Error in Node JS on Terminal
                            
                                How to declare private abstract method in TypeScript?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With