Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expressions matching difficulty

Tags:

regex

excel

vba

My current regex:

([\d]*)([^\d]*[\d][a-z]*-[\d]*)([\d][a-z?])(.?)

Right so I am attempting to make regex match a string based on: a count that can be any amount of number from 0-1million then followed by a number then sometimes a letter then - then any number for numbers followed by the same number and sometimes a letter then sometimes a letter. example of strings it should match:

1921-1220104081741b
192123212a-1220234104081742ab

an example of what it should return based on above (this is 2 examples it shouldn't read both lines.)

(192) (1-122010408174) (1) (b)
(19212321) (2a-122023410408174) (2a) (b)

My current regex works with the second one but it returns (1b) in the first when I would like it to return (1) (b) but also return (2a) in the case of the second one or the case of:

1926h-1220104081746h  Should Return: (192) (6h-122010408174) (6h)

Not 100% sure if its possible, sense I'm fairly new to regex. For reference I'm doing this in excel-vba if there is any other way to do this easier.

like image 986
Persiden Avatar asked Dec 15 '15 17:12

Persiden


2 Answers

You could capture the character(s) before the dash character, and then back reference that match.

In the expression below, \3 would match what was matched by the 3rd capturing group:

(\d*)((\d[a-z]*)-\d*)(\3)([a-z])?

Example Here

enter image description here

Output after merging the capture groups:

1921-1220104081741b
(192) (1-122010408174) (1) (b)
192123212a-1220234104081742ab
(19212321) (2a-122023410408174) (2a) (b)
1926h-1220104081746h
(192) (6h-122010408174) (6h)

Example:

Disregard the JS. Here is the output after merging the capture groups:

var strings = ['1921-1220104081741b', '192123212a-1220234104081742ab', '1926h-1220104081746h'], exp = /(\d*)((\d[a-z]*)-\d*)(\3)([a-z])?/;

strings.forEach(function(str) {
  var m = str.match(exp);
  
  snippet.log(str);
  snippet.log('(' + m[1] + ') ('+ m[2] + ') (' + m[4] + ') (' + (m[5]||'') + ')');
  snippet.log('---');
});
<script src="http://tjcrowder.github.io/simple-snippets-console/snippet.js"></script>
like image 169
Josh Crozier Avatar answered Oct 13 '22 01:10

Josh Crozier


I think what you are saying with "followed by the same number" is that the piece right before the dash is repeated as your third capture group. I would suggest implementing this by splitting up your second capture group and then using a backreference:

([\d]*)([\d][a-z]*)-([\d]*)(\2)(.?)

For your three examples:

1921-1220104081741b
192123212a-1220234104081742ab
1926h-1220104081746h

This results in:

(192)      (1)  - (122010408174)    (1)  (b)
(19212321) (2a) - (122023410408174) (2a) (b)
(192)      (6h) - (122010408174)    (6h) ()

...and you can join the two middle groups back together to get the hyphenated term you wanted.

like image 40
Blackhawk Avatar answered Oct 12 '22 23:10

Blackhawk