Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the regex that properly splits SVG 'd' attributes into tokens?

I am trying to split the d attribute on a path tag in an svg file into tokens.

This one is relatively easy:

d = "M 2 -12 C 5 15 21 19 27 -2 C 17 12 -3 40 5 7"
tokens = d.split(/[\s,]/)

But this is also a valid d attribute:

d = "M2-12C5,15,21,19,27-2C17,12-3,40,5,7"

The tricky parts are letters and numbers are no longer separated and negative numbers use only the negative sign as the separator. How can I create a regex that handles this?

The rules seem to be:

  • split wherever there is white space or a comma
  • split numerics from letters (and keep "-" with the numeric)

I know I can use lookaround, for example:

tokens = pathdef.split(/(?<=\d)(?=\D)|(?<=\D)(?=\d)/)

I'm having trouble forming a single regex that also splits on the minus signs and keeps the minus sign with the numbers.

The above code should tokenize as follows:

[ 'M', '2', '-12', 'C', '5', '15', '21', '19', '27', '-2', 'C', '17', '12', '-3', '40', '5', '7' ]
like image 321
Octopus Avatar asked Dec 13 '17 20:12

Octopus


1 Answers

Brief

Unfortunately, JavaScript doesn't allow lookbehinds, so your options are fairly limited and the regex in the Other Regex Engines section below will not work for you (albeit it will with some other regex engines).

Other Regex Engines

Note: The regex in this section (Other Regex Engines) will not work in Javascript. See the JavaScript solution in the Code section instead.

I think with your original regex you were trying to get to:

[, ]|(?<![, ])(?=-|(?<=[a-z])\d|(?<=\d)[a-z])

This regex allows you to split on those matches (, or , or locations that are followed by -, or locations where a letter precedes a digit or locations where a digit precedes a letter).


Code

var a = [
  "M 2 -12 C 5 15 21 19 27 -2 C 17 12 -3 40 5 7",
  "M2-12C5,15,21,19,27-2C17,12-3,40,5,7"
]

var r = /-?(?:\d*\.)?\d+|[a-z]/gi

a.forEach(function(s){
  console.log(s.match(r));
});

Explanation

  • -?\d+(?:\.\d+)?|[a-z] Match either of the following
    • -?\d+(?:\.\d+)?
      • -? Match - literally zero or one time
        • (?:\d*\.)? Match the following zero or one time
          • \d* Match any number of digits
          • \. Match a literal dot
      • \d+ Match one or more digits
    • [a-z] Match any character in the range from a-z (any lowercase alpha character - since i modifier is used this also matches uppercase variants of those letters)

I added (?:\d*\.)? because (to the best of my knowledge) you can have decimal number values in SVG d attributes.

Note: Changed the original regex portion of \d+(?:\.\d+)? to (?:\d*\.)?\d+ in order to catch numbers that don't have the whole number part such as .5 as per @Thomas (see comments below question).

like image 118
ctwheels Avatar answered Oct 13 '22 02:10

ctwheels