Is there a way to capture tokens inside a non-captured group in Matlab regular expressions? Here is the specific problem:
InputString = 'Identifiers: 10 12 1 3 8 6 4 2'
Expression = 'Identifiers:\s(?:(\d*)\t?)+'
regexp(InputString, Expression, 'tokens')
I need to find the numbers after 'Identifier'. The string InputString is part of a big character array with lines before and after this line, separated by \r\n characters. The character after the colon is a whitespace, the numbers are seperated by tabs. The last number has no trailing tab. The number of numbers can vary, but it's always at least one and only integers with 1 or n digits.
I had the following idea in my Expression: Identify line by Identifiers:\s, find numbers with n>1 digits and captured token and possible trailing tab by (\d*)\t and repeat this 1 or more times by +. To repeat the digit part expression, I need to put it in a group. But I don't want to capture the token of the outer group (?:(\d*)\t?), but of course the token of the inner grouping (\d*). Thats why I used ?:. When I remove ?: only one token containing 1012138642 is returned.
Isn't it possible to capture tokens inside a non-capturing group? Do you have any solution to return the numbers in a single statement?
In my current solution I find the line by
Expression = 'Identifiers:.+?\r\n'
Line = regexp(InputString, Expression, 'match')
and get the digits with
regexp(Line, '(\d+)\t+', 'tokens')
I spend so much time finding a single statement solution, I now really need to know if it's possible or not! I am not sure if I am thinking wrong, my head is not working as intended or it's just impossible.
MATLAB doesn't support nested tokens, even if you you mark them as non capturing.
Starting in 16b there are some new text manipulations that make this easier:
str = "Identifiers: 10 12 1 3 8 6 4 2" + newline + "Blah";
str = str.extractBetween("Identifiers: ",newline).split
str =
8×1 string array
"10"
"12"
"1"
"3"
"8"
"6"
"4"
"2"
If your goal is one statement with regexp, using split might get you closer.
str = regexp(str,'(?<=Identifiers[^\n]*)\s*(?=[^\n]*)','split')
str =
1×10 string array
"Identifiers:" "10" "12" "1" "3" "8" "6" "4" "2" "Blah"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With