I'm trying to build a javascript function capable of parsing a sentence and returning a number.
Here is a jsFiddle I've setup for the test cases below -
16&17 indicate that it should find the first number. I understand that some of the test cases may be tough but I welcome anything that gets me reasonable coverage.
Here is the format I'm using for my function
function parseSentenceForNumber(sentence){
return number; //The number from the string
}
I think I could get 60-80% of the way myself, but I expect a regular expression might be the best solution here and I've never been great at them. Hopefully I have enough test cases but feel free to add any I might of missed.
Your help is much appreciated.
**UPDATE**
Loads of working answers and I need to spend some time looking at them in more detail. Mike Samuel mentioned commas and .5 which leads me to add another couple of test cases
18.'I have 1,000 pound' -> 1000 19.'.5' -> 0.5
And jsalonen mentioned adding test case for no numbers
20.'This sentence contains no numbers' -> null
Here is the updated fiddle using jsalonen's solution, without my changes in spec I'd be 100% there, with the changes I'm 95%. Can anyone offer a solution to number 18 with commas?
**UPDATE**
I added a statement to strip out commas to jsalonen's function and I'm at 100%.
Here is the final function
function parseSentenceForNumber(sentence){
var matches = sentence.replace(/,/g, '').match(/(\+|-)?((\d+(\.\d+)?)|(\.\d+))/);
return matches && matches[0] || null;
}
And the final Fiddle
Really appreciate the help and I have improved my regular expression knowledge along the way. Thanks
Answer that matches all negative and positive numbers with any number of digits:
function parseSentenceForNumber(sentence){
var matches = sentence.match(/(\+|-)?((\d+(\.\d+)?)|(\.\d+))/);
return matches && matches[0] || null;
}
Consider adding negative test cases too, like testing what happens when a string does not have numbers:
test("Test parseSentenceForNumber('This sentence contains no numbers')", function() {
equal( parseSentenceForNumber('This sentence contains no numbers'), null );
});
Full fiddle: http://jsfiddle.net/cvw8g/6/
The regular expression:
\d+(?:\.\d+)?
should do it.
\d+
matches a sequence of digits.(?:...)?
makes that group optionalThis doesn't deal with the special case where the fraction is all zeroes, and you don't want the fraction included in the result, that's difficult with a regexp (I'm not sure if it can even be done, although I'm willing to be proven wrong). It should be easier to handle that after matching the number with the decimal in it.
Once you've matched the number in the string, use parseFloat()
to convert it to a number, and toFixed(2)
to get 2 decimal places.
The general form of a number in computer readable form is:
/[+\-]?((?:[1-9]\d*|0)(?:\.\d*)?|\.\d+)([eE][+-]?\d+)?/
based on the grammar
number := optional_sign (integer optional_fraction | fraction) optional_exponent;
optional_sign := '+' | '0' | ε;
integer := decimal_digit optional_integer;
optional_integer := integer | ε;
optional_fraction := '.' optional_integer | ε;
fraction := '.' integer;
optional_exponent := ('e' | 'E') optional_sign integer;
so you can do
function parseSentenceForNumber(sentence){
var match = sentence.match(
/[+\-]?((?:[1-9]\d*|0)(?:\.\d*)?|\.\d+)([eE][+-]?\d+)?/);
return match ? +match[0] : null; //The number from the string
}
but this doesn't account for
To handle those cases you might search for "entity extraction" since that's the overarching field that tries to find phrases that specify structured data within unstructured text.
One more possible regex:
/\d+\.?\d{0,2}/
This means:
\d
: one or more digits\.?
: zero or one periodd{0,2}
up to 2 digitshttp://jsfiddle.net/cvw8g/7/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With