How can I use regexextract function in Google Docs spreadsheets to get "all" occurrences of a string?

Tags:

google-sheets

My text string is in cell D2:

Decision, ERC Case No. 2009-094 MC, In the Matter of the Application for Authority to Secure Loan from the National Electrification Administration (NEA), with Prayer for Issuance of Provisional Authority, Dinagat Island Electric Cooperative, Inc. (DIELCO) applicant(12/29/2011)

This function:

=regexextract(D2,"\([A-Z]*\)")

will grab (NEA) but not (DIELCO)

I would like it to extract both (NEA) and (DIELCO)

234

asked Jan 06 '12 05:01

2 Answers

Here are two solutions, one using the specific terms in the author's example, the other one expanding on the author's sample regex pattern which appears to match all ALLCAPS terms. I'm not sure which is wanted, so I gave both.

(Put the block of text in A1)

Generic solution for all words in ALLCAPS

=regexreplace(regexreplace(REGEXREPLACE(A1,"\b\w[^A-Z]*\b","|"),"\W+","|"),"^\||\|$","")

Result:

ERC|MC|NEA|DIELCO

NB: The brunt of the work is in the CAPITALIZED formula, the lowercase functions are just for cleanup.

If you want space separation, the formula is a little simpler:

=trim(regexreplace(REGEXREPLACE(A1,"\b\w[^A-Z]*\b"," "),"\W+"," "))

Result:

ERC MC NEA DIELCO

(One way I like playing with regex in google spreadsheets is to read the regex pattern from another cell so I can change it without having to edit or re-paste into all the cells using that pattern. This looks so:

Cell A1:

Block of text

Cell B1 (no quote marks):

\b\w[^A-Z]*\b

Formula, in any cell:

=trim(regexreplace(REGEXREPLACE(A1,B$1," "),"\W+"," "))

By anchoring it to B$1, I can fill all my rows at once and the reference won't increment.)

Previous answer:

Specific solution for selected terms (ERC, DIELCO)

=regexreplace(join("|",IF(REGEXMATCH(A1,"ERC"),"ERC",""),IF(REGEXMATCH(A1,"DIELCO"),"DIELCO","")),"(^\||\|$)","")

Result:

ERC|DIELCO

As before, the brunt of the work is in the CAPITALIZED formula, the lowercase functions are just for cleanup.

This formula will find any ERC or DIELCO, or both in the block of text. The initial order doesn't matter, but the output will always be ERC followed by DIELCO (the order of appearance is lost). This fixes the shortcoming with the previous answer using "(bra).*(bra)" in that isolated ERC or DIELCO can still be matched.

This also has a simpler form with space separation:

=trim(join(" ",IF(REGEXMATCH(A1,"ERC"),"ERC",""),IF(REGEXMATCH(A1,"DIELCO"),"DIELCO","")))

Result:

ERC DIELCO

125

answered Sep 29 '22 10:09

Dannid

You can use capture groups, which will cause regexextract() to return an array. You can use this as the cell result, in which case you will get a range of results, or you can feed the array to another function to reformat it to your purpose. For example:

regexextract( "abracadabra" ; "(bra).*(bra)" )

will return the array:

{bra,bra}

Another approach would be to use regexreplace(). This has the advantage that the replace is global (like s/pattern/replacement/g), so you do not need to know the number of results in advance. For example:

regexreplace( "aBRAcadaBRA" ; "[a-z]+" ; "..." )

will return the string:

...BRA...BRA

answered Sep 29 '22 08:09

MetaEd

Related questions
                            
                                how to split a string in js with some exceptions
                            
                                What's the maximum number of numbered regex captures?
                            
                                jQuery select HTML between two string identifiers
                            
                                Python unicode regular expression matching failing with some unicode characters -bug or mistake?
                            
                                Stripping out select querystring attribute/value pairs so varnish will not vary cache by them
                            
                                scala: split string by commnas, ignoring commas between quotes [duplicate]
                            
                                Can't find the correct regex syntax to match newline or end of string
                            
                                How does this weird JavaScript function for primality check work? [duplicate]
                            
                                Extract All Unique Lines
                            
                                Redirect any urls to 404.html if not found in urls.py in django
                            
                                Cut within a pattern using Python regex
                            
                                "diff" tool's flavor of regex seems lacking?
                            
                                Using grep to match md5 hashes
                            
                                What is up with [A-Z] meaning [A-Za-z]?
                            
                                How to replace text in text file using bat file script?
                            
                                C# - Splitting on a pipe with an escaped pipe in the data?
                            
                                How to write this regular expression in Lua?
                            
                                Perl regex substitute from hash
                            
                                Can I define custom character class shorthands?
                            
                                How to Remove Duplicate Matches in a MatchCollection

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I use regexextract function in Google Docs spreadsheets to get "all" occurrences of a string?

Tags:

regex

google-sheets

nicknich3

People also ask

2 Answers

Generic solution for all words in ALLCAPS

Specific solution for selected terms (ERC, DIELCO)

Dannid

MetaEd

Recent Activity

Donate For Us