Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse substring to double directly

Tags:

c#

.net

f#

If I have a string such as 1 2 3 and I identify the position of a substring containing a double, how can I parse it directly from the substring without creating a temporary string?

For example, I could do System.Double.Parse(str.Substring(0, 1)) but that would create a temporary string which is slow and needless. Is it possible to parse a double directly from part of the original string?

EDIT

Eric Lippert has questioned my motives here, stating that "Small strings are cheap". The motivation for this comes from my doing the same thing for the parsing of ints and seeing a massive performance improvements because, apparently, small strings are not so cheap.

Here is a function that lexes a sequence of ints via temporary strings:

let lex f (s: string) =
  let rec inside i0 (s: string, i) =
    if i = s.Length then
      f (s.Substring(i0, i-i0) |> System.Int32.Parse)
    else
      let c = s.[i]
      if '0'<=c && c<='9' then
        inside i0 (s, i+1)
      else
        f (s.Substring(i0, i-i0) |> System.Int32.Parse)
        outside (s, i)
  and outside (s: string, i) =
    if i < s.Length then
      let c = s.[i]
      if '0'<=c && c<='9' then
        inside i (s, i)
      else
        outside (s, i+1)
  outside (s, 0)

This takes 2.4s to lex 15,625,000 ints from a string.

Here is a version that avoids temporary strings:

let lex f (s: string) =
  let rec inside n (s: string, i) =
    if i = s.Length then f n else
      let c = s.[i]
      if '0'<=c && c<='9' then
        inside (10*n + int c - int '0') (s, i+1)
      else
        f n
        outside (s, i)
  and outside (s: string, i) =
    if i < s.Length then
      let c = s.[i]
      if '0'<=c && c<='9' then
        inside 0 (s, i)
      else
        outside (s, i+1)
  outside (s, 0)

This takes 0.255s, over 9x faster than the solution that uses temporary strings!

I see no reason why lexing floats should be any different. Therefore, by not providing the ability to parse a float from a substring .NET is leaving an order of magnitude in performance on the table. I do a lot of scientific computing and often have to lex large amounts of data, especially at startup, so I really don't want to throw performance to the wind like this.

like image 985
J D Avatar asked Jan 07 '16 02:01

J D


People also ask

Is there a parse double?

The parseDouble() method of Java Double class is a built in method in Java that returns a new double initialized to the value represented by the specified String, as done by the valueOf method of class Double. Parameters: It accepts a single mandatory parameter s which specifies the string to be parsed.


2 Answers

Yes, I think it's totally doable. You can write your own function to do parsing, you can even base it on actual source code of Double.Parse(). This code doesn't look big and scary and I think you can optimize it even more for your needs.

like image 149
Alex Butenko Avatar answered Oct 05 '22 03:10

Alex Butenko


You could parse the string digit by digit, something like this:

static double CustomConvertToDouble(string input, int startIndex, int length)
{
    double result = 0d;
    int lastDigitIndex = startIndex + length - 1;
    int power = 0;
    for (int i = lastDigitIndex; i >= startIndex; i--)
    {
        int digit = (input[i] - '0');
        result += (Math.Pow(10, power++)) * digit;
    }
    return result;
}

Usage:

string tmp = "1 2 3";
double result = CustomConvertToDouble(tmp, 0, 1);
Console.WriteLine(result); // 1

You could expand on this to take decimal points etc. into account.

But I really doubt if the normal way can be a performance bottleneck and I'm interested to know why you want to go to the trouble. If that piece of code is really that performance-critical, maybe the best route is writing it in another language?

like image 31
Saeb Amini Avatar answered Oct 05 '22 01:10

Saeb Amini