Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse a string containing text for a number/float in javascript?

I'm trying to build a javascript function capable of parsing a sentence and returning a number.

Here is a jsFiddle I've setup for the test cases below -

  1. 'I have 1 pound' -> 1
  2. 'I have £3.50 to spend' -> 3.50
  3. 'I have 23.00 pounds' -> 23
  4. '£27.33' -> 27.33
  5. '$4345.85' -> 4345.85
  6. '3.00' -> 3
  7. '7.0' -> 7
  8. 'Should have 2.0.' -> 2
  9. 'Should have 15.20.' -> 15.20
  10. '3.15' -> 3.15
  11. 'I only have 5, not great.' -> 5
  12. ' 34.23' -> 34.23
  13. 'sdfg545.14sdfg' -> 545.14
  14. 'Yesterday I spent £235468.13. Today I want to spend less.' -> 235468.13
  15. 'Yesterday I spent 340pounds.' -> 340
  16. 'I spent £14.52 today, £17.30 tomorrow' -> 14.52
  17. 'I have 0 trees, £11.33 tomorrow' -> 0

16&17 indicate that it should find the first number. I understand that some of the test cases may be tough but I welcome anything that gets me reasonable coverage.

Here is the format I'm using for my function

function parseSentenceForNumber(sentence){

    return number; //The number from the string
}

I think I could get 60-80% of the way myself, but I expect a regular expression might be the best solution here and I've never been great at them. Hopefully I have enough test cases but feel free to add any I might of missed.

Your help is much appreciated.

**UPDATE**

Loads of working answers and I need to spend some time looking at them in more detail. Mike Samuel mentioned commas and .5 which leads me to add another couple of test cases

18.'I have 1,000 pound' -> 1000 19.'.5' -> 0.5

And jsalonen mentioned adding test case for no numbers

20.'This sentence contains no numbers' -> null

Here is the updated fiddle using jsalonen's solution, without my changes in spec I'd be 100% there, with the changes I'm 95%. Can anyone offer a solution to number 18 with commas?

**UPDATE**

I added a statement to strip out commas to jsalonen's function and I'm at 100%.

Here is the final function

function parseSentenceForNumber(sentence){
    var matches = sentence.replace(/,/g, '').match(/(\+|-)?((\d+(\.\d+)?)|(\.\d+))/);
    return matches && matches[0] || null;
}

And the final Fiddle

Really appreciate the help and I have improved my regular expression knowledge along the way. Thanks

like image 349
Ben Avatar asked Jul 26 '13 15:07

Ben


4 Answers

Answer that matches all negative and positive numbers with any number of digits:

function parseSentenceForNumber(sentence){
    var matches = sentence.match(/(\+|-)?((\d+(\.\d+)?)|(\.\d+))/);
    return matches && matches[0] || null;
}

Consider adding negative test cases too, like testing what happens when a string does not have numbers:

test("Test parseSentenceForNumber('This sentence contains no numbers')", function() {
  equal( parseSentenceForNumber('This sentence contains no numbers'), null );
});

Full fiddle: http://jsfiddle.net/cvw8g/6/

like image 123
jsalonen Avatar answered Oct 20 '22 12:10

jsalonen


The regular expression:

\d+(?:\.\d+)?

should do it.

  • \d+ matches a sequence of digits.
  • .\d+ matches a decimal point followed by digits.
  • (?:...)? makes that group optional

This doesn't deal with the special case where the fraction is all zeroes, and you don't want the fraction included in the result, that's difficult with a regexp (I'm not sure if it can even be done, although I'm willing to be proven wrong). It should be easier to handle that after matching the number with the decimal in it.

Once you've matched the number in the string, use parseFloat() to convert it to a number, and toFixed(2) to get 2 decimal places.

like image 21
Barmar Avatar answered Oct 20 '22 12:10

Barmar


The general form of a number in computer readable form is:

/[+\-]?((?:[1-9]\d*|0)(?:\.\d*)?|\.\d+)([eE][+-]?\d+)?/

based on the grammar

number            := optional_sign (integer optional_fraction | fraction) optional_exponent;
optional_sign     := '+' | '0' | ε;
integer           := decimal_digit optional_integer;
optional_integer  := integer | ε;
optional_fraction := '.' optional_integer | ε;
fraction          := '.' integer;
optional_exponent := ('e' | 'E') optional_sign integer;

so you can do

function parseSentenceForNumber(sentence){
  var match = sentence.match(
      /[+\-]?((?:[1-9]\d*|0)(?:\.\d*)?|\.\d+)([eE][+-]?\d+)?/);
  return match ? +match[0] : null; //The number from the string
}

but this doesn't account for

  1. Locales that use fraction separators other than '.' as in "π is 3,14159..."
  2. Commas to separate groups of digits as in 1,000,000
  3. Fractions
  4. Percentages
  5. Natural language descriptions like "a dozen" or "15 million pounds"

To handle those cases you might search for "entity extraction" since that's the overarching field that tries to find phrases that specify structured data within unstructured text.

like image 2
Mike Samuel Avatar answered Oct 20 '22 12:10

Mike Samuel


One more possible regex:

/\d+\.?\d{0,2}/

This means:

  • \d: one or more digits
  • \.?: zero or one period
  • d{0,2} up to 2 digits

http://jsfiddle.net/cvw8g/7/

like image 1
bfavaretto Avatar answered Oct 20 '22 10:10

bfavaretto