Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple regex matches in Google Sheets formula

I'm trying to get the list of all digits preceding a hyphen in a given string (let's say in cell A1), using a Google Sheets regex formula :

=REGEXEXTRACT(A1, "\d-")

My problem is that it only returns the first match... how can I get all matches?

Example text:

"A1-Nutrition;A2-ActPhysiq;A2-BioMeta;A2-Patho-jour;A2-StgMrktg2;H2-Bioth2/EtudeCas;H2-Bioth2/Gemmo;H2-Bioth2/Oligo;H2-Bioth2/Opo;H2-Bioth2/Organo;H3-Endocrino;H3-Génétiq"

My formula returns 1-, whereas I want to get 1-2-2-2-2-2-2-2-2-2-3-3- (either as an array or concatenated text).

I know I could use a script or another function (like SPLIT) to achieve the desired result, but what I really want to know is how I could get a re2 regular expression to return such multiple matches in a "REGEX.*" Google Sheets formula. Something like the "global - Don't return after first match" option on regex101.com

I've also tried removing the undesired text with REGEXREPLACE, with no success either (I couldn't get rid of other digits not preceding a hyphen).

Any help appreciated! Thanks :)

like image 827
flo5783 Avatar asked Apr 16 '17 00:04

flo5783


People also ask

How do you find multiple matches in Google Sheets?

The first formula we will use to match multiple values in Google Sheets is =IF(SUM(ArrayFormula(IF(LEN(A3:A),ArrayFormula(–REGEXMATCH(A3:A, “Pants black|Dress blue|Coat black”)),””)))>=3,”In Stock”, “Out of Stock”). As you can see, we used the REGEXMATCH , IF , LEN , and ArrayFormula functions to build it.

Does Google Sheets support REGEX?

There are three Google Sheets REGEX formulas: REGEXMATCH, REGEXEXTRACT, and REGEXREPLACE. Each has a specific job: REGEXMATCH will confirm whether it finds the pattern in the text. REGEXEXTRACT will extract text that matches the pattern.

What does Regexmatch mean?

REGEXMATCH(text, regular_expression) text - The text to be tested against the regular expression. regular_expression - The regular expression to test the text against.


2 Answers

You can actually do this in a single formula using regexreplace to surround all the values with a capture group instead of replacing the text:

=join("",REGEXEXTRACT(A1,REGEXREPLACE(A1,"(\d-)","($1)")))

basically what it does is surround all instances of the \d- with a "capture group" then using regex extract, it neatly returns all the captures. if you want to join it back into a single string you can just use join to pack it back into a single cell:

enter image description here

like image 155
Aurielle Perlmann Avatar answered Oct 18 '22 19:10

Aurielle Perlmann


You may create your own custom function in the Script Editor:

function ExtractAllRegex(input, pattern,groupId) {
  return [Array.from(input.matchAll(new RegExp(pattern,'g')), x=>x[groupId])];
}

Or, if you need to return all matches in a single cell joined with some separator:

function ExtractAllRegex(input, pattern,groupId,separator) {
  return Array.from(input.matchAll(new RegExp(pattern,'g')), x=>x[groupId]).join(separator);
}

Then, just call it like =ExtractAllRegex(A1, "\d-", 0, ", ").

Description:

  • input - current cell value
  • pattern - regex pattern
  • groupId - Capturing group ID you want to extract
  • separator - text used to join the matched results.
like image 10
Wiktor Stribiżew Avatar answered Oct 18 '22 18:10

Wiktor Stribiżew