Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Capturing first number in text with regex in javascript

Context

I have some CSS to do transitions:

div.mad {
    -webkit-transition: top .4s, left .5s linear, opacity .75s, padding-top 1s;
    transition: top .4s, left .5s linear, opacity 3s, padding-top 1s;
}

I'm trying to find the maximum value in this list, which is easy enough to do with a regular expression.

/(\d*\.){0, 1}\d+/g

My issue is that when I get the CSS value

$("div.mad").css("transition")

it comes back as

top 0.4s ease 0s, left 0.5s linear 0s, opacity 3s ease 0s, padding-top 1s ease 0s

Now my regex gets the delay values ("0") as well. Considering I'm trying to find the maximum, that's fine as well, but I'm a purist at heart, and I would like to limit the matches to just the transition times.

My Broken Solution

The regex I concocted is

/(?:[^\d\.]*)((\d*\.){0, 1}\d+)(?:s[^,]*)/g

The reasoning breakdown:

(?:[^\d\.]*)       -- non-capturing group that looks for anything that is not a digit or a decimal point
                   -- should match "top ", "left ", etc.

(                  -- begin capture group
    (\d*\.){0, 1}  -- capture ones, tens, etc + decimal point, if it exists
    \d+            -- capture tenths, hundreds, etc if decimal exists, else capture ones, tens, etc
) -- close capture group

(?:s[^,]*)         -- non-capturing group for the remainder of the transition element

When I run

var t = "top 0.4s ease 0s, left 0.5s linear 0s, opacity 3s ease 0s, padding-top 1s ease 0s";
var r = /[^\d\.]*((\d*\.){0,1}\d*)s[^,]*/g
var m = t.match(r);

the results for each m are:

m[0] = "top 0.4s ease 0s"
m[1] = ", left 0.5s linear 0s"
m[2] = ", opacity 3s ease 0s"
m[3] = ", padding-top 1s ease 0s"

jsfiddle example

I thought the idea of a non-capturing group was that it would match the characters, but ignore them when you tried to access the groups.

I have a hunch that I'm looking at matches rather than groups, but I haven't figured out how to get the groups instead.

Help?

UPDATE 1

Per the comment, I updated

r = /(?:(?:[\,]*,)*[^\d\.]*)(\d*\.?\d+)s[^,]*/

and tried using RegExp.exec() (which I had already tried before, though it didn't work right until I updated r). The result is

m[0] = "top 0.4s ease 0s"
m[1] = "0.4"

m[1] does capture the first number, but it ignores the following ones.

I also discovered that the issue I was having with t.match(r) was the /g flag. Removing it gives the same result as r.exec(t).

Aside from splitting t on ',' and running the regex on each term, is there a way to do this in a single regex?

UPDATE 2

@Esteban Felix's alternative answer is clearly the best option.

$('div.mad').css('transition-duration').replace(/[^\d\.,]/g,'').split(',');

TL;DR: Use the above for this case, but here's an explanation of why maybe you shouldn't in other cases.

The only modification I would consider would be appending + to the end of [^\d\.,], in order to decrease the number of replacements and improving performance by an imperceivable amount in more common strings (not the .css('transition-duration') case, as I'll explain in a second).

The reason it might improve performance is that in Javascript, strings are immutable, so creating a new string for every character being removed takes up time. In my case, that's only the 's and s's. With a string of

0.4s, 0.5s, 0.75s, 1s

the spaces and the s's are never next to each other, so the result would actually be a worsening of performance since now the regex engine has to check the following character every time it finds a character to remove. However, in more common strings where you remove a lot of consecutive characters, adding the + could improve performance. The only reason it might not is if the implementation of String.replace() is smart, and uses a character array behind the scenes, only allocating space for a new string at the end of the function. This aspect is browser-dependent, but I would guess it's the common case for modern browsers.

It's also worth noting that it's important to use a + and not a *, as the latter would match every position between characters, replacing the matched empty string with the specified empty string. I don't know whether the javascript engine would create a ton of new, identical strings or not, but it certainly can't improve performance.

If you really care about this commonly-negligible performance bump, do some (read: a lot of) benchmarking. The only possible way you will see any difference at all is if

  1. you are running the code on a Compaq Presario 286 MMX with 64MB of RAM (i.e. my first computer from 1997) or
  2. you run this regex replacement many thousands of times in an inner loop on strings where most of the characters to be removed are in long, unbroken runs in
  3. Internet Explorer 1.5

So, the modification to the selected answer might in fact reduce performance depending on your browser and the type of strings you run it against, but, as I said before, I'm a purist, and love generalization and optimization, and thus my explanation.

like image 501
dx_over_dt Avatar asked Oct 27 '14 19:10

dx_over_dt


People also ask

How do I capture a number in regex?

To match any number from 0 to 9 we use \d in regex. It will match any single digit number from 0 to 9. \d means [0-9] or match any number from 0 to 9. Instead of writing 0123456789 the shorthand version is [0-9] where [] is used for character range.

How do you get a number from a string in regex?

Python Regex – Get List of all Numbers from String. To get the list of all numbers in a String, use the regular expression '[0-9]+' with re. findall() method. [0-9] represents a regular expression to match a single digit in the string.

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1. 1* means any number of ones.

How do I find the first character in a regular expression?

Using the regular expressions.The matches() method of the String class accepts a regular expression and verifies it matches with the current String, if so, it returns true else, it returns false. The regular expression to match String which contains a digit as first character is “^[0-9]. *$”.


1 Answers

You will want to use RegExp.exec instead of RegExp.match.

How to use RegExp.exec (from MDN):

If your regular expression uses the "g" flag, you can use the exec method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property (test() will also advance the lastIndex property).

An example with your code:

var t = "top 0.4s ease 0s, left 0.5s linear 0s, opacity 3s ease 0s, padding-top 1s ease 0s";
var r = /[^\d\.]*((\d*\.){0,1}\d*)s[^,]*/g
var m;
while((m = r.exec(t)) !== null) {
    console.log(m[2]); // <- number you want to extract
}

Also, there is a slight issue with your regex that is causing the number after the decimal place not to be captured. The updated regex:

/(?:[^\d\.]*)(\d*\.?\d+)(?:s[^,]*)/g

Alternative

You can also, instead of looking at transition look at the property you actually care about which is transition-duration.

$('div.mad').css('transition-duration').replace(/[^\d\.,]/g,'').split(',');

Then you can loop over this array directly.

like image 62
Esteban Felix Avatar answered Oct 10 '22 03:10

Esteban Felix