Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace many matches in the text with different values using regexprep in Matlab

Tags:

regex

matlab

I'm using the function regexprep in Matlab to replace several instances of a pattern with a list of values from a cell array. The idea is to replace the first match with the first value, the second with the next one, and so on. So each match replaced with a different value from the cell array.

From the documentation I read that:

If replace is a cell array of N character vectors and expression is a single character vector, then regexprep attempts N matches and replacements.

So here is an example of the task I have (for this example let's assume that I know there are only 4 matches):

% some text:
str = 'abc s;dlf kudnbv. soergi; abcva/.lge roins.br oianabca/ sergosr toibnsabc';
pattern = '([a][b][c])'; % the patern to match
values = {'111','222','333','444'}; % the cell array 
new_str = regexprep(str,pattern,values) % the actual raplace

The result:

new_str =
    '111 s;dlf kudnbv. soergi; 111va/.lge roins.br oian111a/ sergosr toibns111'

This result is not correct, of course, because all the matches were replaced by the first value in the cell array.

So I googled this problem and found this explanation. Apparently the function regexprep execute the replaces one by one, so after the first replacement, the first match that is found is what was the second match originally, and because it is recognized as the first one, it is replaced by the first value in the cell array (111).

I can work around this with a loop that preforms this task with a different value each time:

new_str = str;
for k = 1:numel(values)
    new_str = regexprep(new_str,pattern,values(k),'once'); % raplace one value each time
end

The result:

new_str =
    '111 s;dlf kudnbv. soergi; 222va/.lge roins.br oian333a/ sergosr toibns444'

which is exactly what I want.

My question is how to write the pattern or use the regexprep in order to achieve the same result without a loop?
It seems to me that I miss something in how to use this function. I'll also add that my true problem has over 100 matches within the text, so using a pattern like ([a][b][c])(.*)([a][b][c])(.*)([a][b][c])(.*)([a][b][c]) and a replace pattern like 111$2222$4333$6444 (that gives the correct result here) is not really an option.

Any help will be appreciated!

like image 294
EBH Avatar asked Jan 01 '23 23:01

EBH


1 Answers

You could make a basic helper string generator and use the command execution replacement token.

For example:

classdef strgenerator < handle
    properties
        strs
        ii = 1
    end

    methods
        function self = strgenerator(strs)
            self.strs = strs;
        end

        function outstr = nextstr(self)
            outstr = self.strs{self.ii};

            self.ii = self.ii + 1;
            if self.ii > numel(self.strs)
                self.ii = 1;
            end
        end
    end
end

And

str = 'abc s;dlf kudnbv. soergi; abcva/.lge roins.br oianabca/ sergosr toibnsabc';
pattern = '([a][b][c])'; % the patern to match
values = strgenerator({'111','222','333','444'}); % the cell array 
new_str = regexprep(str,pattern,'${values.nextstr()}') % the actual raplace

Provides us with:

>> SOcode

new_str =

    '111 s;dlf kudnbv. soergi; 222va/.lge roins.br oian333a/ sergosr toibns444'
like image 124
sco1 Avatar answered Jan 04 '23 23:01

sco1