Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to extract valid cell references from a spreadsheet formula

I am attempting to extract valid cell references and range references from a spreadsheet formula, using Google Apps Script (Javascript).

A valid cell reference is one or two letters, followed by consecutive numbers not starting with a zero. Either the letter(s) or the number(s) may or may not be preceded by a $ character. The entire reference can't be preceded/proceeded by a letter, number or underscore (in which case it may be part of either a spreadsheet function or the name of a named range) or a colon (in which case it may be part of range reference).

The range reference regex (rangeRefRe) seems to work well; but my cell reference regex (cellRefRe) fails to find a match. Would be great if someone could point out what I'm doing wrong.

function myFunction()
{
  var formula = '=A100+B$2:2+INDIRECT("A2:B")+$C3-SUM($D$1:$E5)';
  var fSegments = formula.split('"'); // I want to exclude references within double quotation marks
  var rangeRefRe = /[^0-9a-zA-Z_$]([0-9a-zA-Z$]+?:[0-9a-zA-Z$]+)(?![0-9a-zA-Z_])/g;
  var cellRefRe = /[^0-9a-zA-Z_$:](\${,1}[a-zA-Z]{1,2}\${,1}[1-9][0-9]*)(?![0-9a-zA-Z_:])/g;
  var refResult;
  var references = [];
  for (var i = 0; i < fSegments.length; i += 2)
  {
    while (refResult = rangeRefRe.exec(fSegments[i]))
    {
      references.push(refResult[1]);
    }
    while (refResult = cellRefRe.exec(fSegments[i]))
    {
      references.push(refResult[1]);
    }
  }
  Logger.log(references);
}
like image 931
AdamL Avatar asked Dec 26 '22 14:12

AdamL


2 Answers

JavaScript doesn't support this part of your regex: {,1}. To allow 0 or 1 occurrences it would need to be {0,1}, or you can replace that with just ?:

/[^0-9a-zA-Z_$:](\$?[a-zA-Z]{1,2}\$?[1-9][0-9]*)(?![0-9a-zA-Z_:])/g;
like image 127
nnnnnn Avatar answered Dec 28 '22 07:12

nnnnnn


The question and answers were incredibly helpful but I ran into a few problems so here are some notes for future readers:

  1. It might be good to add "(" to the characters the regex can't end in. The formula could contain a call to a custom function named "a1" or something similar. Adding left-parenthesis would prevent matching a call to such badly named custom functions.

  2. While "A2:A" and "A1:2" are valid ranges, ranges like "A:2" are not.

  3. I needed the references ordered in the way they appeared in the formula. A single regex for both ranges and cells would solve that problem.

Here's the regex I came up with:

/[^0-9a-zA-Z_$:]\$?([a-zA-Z]+(\$?[1-9]\d*(:(\$?[a-zA-Z]+)?\$?([1-9]\d*)?)?|((:\$?[a-zA-Z]+\$?([1-9]\d*)?))))(?![0-9a-zA-Z_(])/g;
like image 27
Joshua Dawson Avatar answered Dec 28 '22 07:12

Joshua Dawson