Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to define regexp variables in TM language?

In sublime-syntax file you can define variables to use in regular expressions (like - match: "{{SOME_VARIABLE}}"). It looks like you can't in tmLanguage (https://macromates.com), but highlighters frequently expand variables, then is there an utility that adds variable support like this for the TM language descriptor, so it can be used with VSCode? I found nothing with the search engine.


1 Answers

I too was looking for this functionality as the regular expressions get long and complex very quickly, especially if writing the tmLanguage file in JSON, which forces you to escape some characters with \\.

It seems not to be supported out of the box by textmate. However you can have variable support if you don't mind some pre-processing.

I found this kind of solution browsing Microsoft TypeScript TmLanguage GitHub repository. They define the Typescript grammar in YAML, which is more readable and requires only one anti-slash to escape characters. In this YAML file, they define "variables" for frequently used patterns, e.g.:

variables:
  startOfIdentifier: (?<![_$[:alnum:]])(?:(?<=\.\.\.)|(?<!\.))
  endOfIdentifier: (?![_$[:alnum:]])(?:(?=\.\.\.)|(?!\.))
  propertyAccess: (?:(\.)|(\?\.(?!\s*[[:digit:]])))
  propertyAccessPreIdentifier: \??\.\s*
  identifier: '[_$[:alpha:]][_$[:alnum:]]*'
  constantIdentifier: '[[:upper:]][_$[:digit:][:upper:]]*'
  propertyIdentifier: '\#?{{identifier}}'
  constantPropertyIdentifier: '\#?{{constantIdentifier}}'
  label: ({{identifier}})\s*(:)

Then they reuse those "variables" in the pattern definitions (or even in other variables, if you look above, the label variable uses the identifier variable), e.g.:

enum-declaration:
    name: meta.enum.declaration.ts
    begin: '{{startOfDeclaration}}(?:\b(const)\s+)?\b(enum)\s+({{identifier}})'
    beginCaptures:
      '1': { name: keyword.control.export.ts }
      '2': { name: storage.modifier.ts}
      '3': { name: storage.modifier.ts}
      '4': { name: storage.type.enum.ts }
      '5': { name: entity.name.type.enum.ts } 

And finally they use a build script to transform this YAML grammar to a plist or json grammar. In this build script, they remove the "variables" property from the grammar as it is not part of the tmLanguage spec and they loop over the variables definitions to replace their occurrences ({{variable}}) in other variables or begin, end, match patterns.

function replacePatternVariables(pattern: string, variableReplacers: VariableReplacer[]) {
    let result = pattern;
    for (const [variableName, value] of variableReplacers) {
        result = result.replace(variableName, value);
    }
    return result;
}

type VariableReplacer = [RegExp, string];
function updateGrammarVariables(grammar: TmGrammar, variables: MapLike<string>) {
    delete grammar.variables;
    const variableReplacers: VariableReplacer[] = [];
    for (const variableName in variables) {
        // Replace the pattern with earlier variables
        const pattern = replacePatternVariables(variables[variableName], variableReplacers);
        variableReplacers.push([new RegExp(`{{${variableName}}}`, "gim"), pattern]);
    }
    transformGrammarRepository(
        grammar,
        ["begin", "end", "match"],
        pattern => replacePatternVariables(pattern, variableReplacers)
    );
    return grammar;
}

Not exactly what you (and I) were looking for but if your grammar is big enough, it helps. If the grammar is not quite big enough, then I would not use this pre-processing.

like image 159
ghis Avatar answered Sep 14 '25 11:09

ghis