Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to solve chain-calculations

Tags:

performance

c#

I have a input like

string input = "14 + 2 * 32 / 60 + 43 - 7 + 3 - 1 + 0 * 7 + 87 - 32 / 34"; 
// up to 10MB string size

int result = Calc(input); // 11
  • the calculation is from left to right, number by number
  • the numbers are 0 to 99
  • multiplication before addition is ignored so 14 + 2 * 32 is 512
  • the possible calculations are +-*/
  • division by 0 is not possible so after a / can't be a 0

My Approach

public static int Calc(string sInput)
{
    int iCurrent = sInput.IndexOf(' ');
    int iResult = int.Parse(sInput.Substring(0, iCurrent));
    int iNext = 0;
    while ((iNext = sInput.IndexOf(' ', iCurrent + 4)) != -1)
    {
        iResult = Operate(iResult, sInput[iCurrent + 1], int.Parse(sInput.Substring((iCurrent + 3), iNext - (iCurrent + 2))));
        iCurrent = iNext;
    }
    return Operate(iResult, sInput[iCurrent + 1], int.Parse(sInput.Substring((iCurrent + 3))));
}

public static int Operate(int iReturn, char cOperator, int iOperant)
{
    switch (cOperator)
    {
        case '+':
            return (iReturn + iOperant);
        case '-':
            return (iReturn - iOperant);
        case '*':
            return (iReturn * iOperant);
        case '/':
            return (iReturn / iOperant);
        default:
            throw new Exception("Error");
    }
}

I need the fastest way to get a result.

Question: is there a way to make this calculation faster? I have multiple threads but I use only one.

Update:

Test-Case: (I've removed the division by 0 bug and removed the StringBuilder.ToString() from the StopWatch measurement)

Random rand = new Random();
System.Text.StringBuilder input = new System.Text.StringBuilder();
string operators = "+-*/";
input.Append(rand.Next(0, 100));
for (int i = 0; i < 1000000; i++)
{
    int number = rand.Next(0, 100);
    char coperator = operators[rand.Next(0, number > 0 ? 4 : 3)];
    input.Append(" " + coperator + " " + number);
}
string calc = input.ToString();
System.Diagnostics.Stopwatch watch = new System.Diagnostics.Stopwatch();
watch.Start();
int result = Calc(calc);
watch.Stop();
like image 658
Impostor Avatar asked Mar 22 '18 08:03

Impostor


Video Answer


3 Answers

Edit edit: updated with latest versions by The General and Mirai Mann:

If you want to know which horse is fastest: race the horses. Here are BenchmarkDotNet results comparing various answers from this question (I have not merged their code into my full example, because that feels wrong - only the numbers are presented) with repeatable but large random input, via:

static MyTests()
{
    Random rand = new Random(12345);
    StringBuilder input = new StringBuilder();
    string operators = "+-*/";
    var lastOperator = '+';
    for (int i = 0; i < 1000000; i++)
    {
        var @operator = operators[rand.Next(0, 4)];
        input.Append(rand.Next(lastOperator == '/' ? 1 : 0, 100) + " " + @operator + " ");
        lastOperator = @operator;
    }
    input.Append(rand.Next(0, 100));
    expression = input.ToString();
}
private static readonly string expression;

with sanity checks (to check they all do the right thing):

Original: -1426
NoSubStrings: -1426
NoSubStringsUnsafe: -1426
TheGeneral4: -1426
MiraiMann1: -1426

we get timings (note: Original is OP's version in the question; NoSubStrings[Unsafe] is my versions from below, and two other versions from other answers by user-name):

(lower "Mean" is better)

(note; there is a newer version of Mirai Mann's implementation, but I no longer have things setup to run a new test; but: fair to assume it should be better!)

Runtime: .NET Framework 4.7 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.2633.0

             Method |      Mean |     Error |    StdDev |
------------------- |----------:|----------:|----------:|
           Original | 104.11 ms | 1.4920 ms | 1.3226 ms |
       NoSubStrings |  21.99 ms | 0.4335 ms | 0.7122 ms |
 NoSubStringsUnsafe |  20.53 ms | 0.4103 ms | 0.6967 ms |
        TheGeneral4 |  15.50 ms | 0.3020 ms | 0.5369 ms |
         MiraiMann1 |  15.54 ms | 0.3096 ms | 0.4133 ms |

Runtime: .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2633.0

             Method |      Mean |     Error |    StdDev |    Median |
------------------- |----------:|----------:|----------:|----------:|
           Original | 114.15 ms | 1.3142 ms | 1.0974 ms | 114.13 ms |
       NoSubStrings |  21.33 ms | 0.4161 ms | 0.6354 ms |  20.93 ms |
 NoSubStringsUnsafe |  19.24 ms | 0.3832 ms | 0.5245 ms |  19.43 ms |
        TheGeneral4 |  13.97 ms | 0.2795 ms | 0.2745 ms |  13.86 ms |
         MiraiMann1 |  15.60 ms | 0.3090 ms | 0.4125 ms |  15.53 ms |

Runtime: .NET Core 2.1.0-preview1-26116-04 (CoreCLR 4.6.26116.03, CoreFX 4.6.26116.01), 64bit RyuJIT

             Method |      Mean |     Error |    StdDev |    Median |
------------------- |----------:|----------:|----------:|----------:|
           Original | 101.51 ms | 1.7807 ms | 1.5786 ms | 101.44 ms |
       NoSubStrings |  21.36 ms | 0.4281 ms | 0.5414 ms |  21.07 ms |
 NoSubStringsUnsafe |  19.85 ms | 0.4172 ms | 0.6737 ms |  19.80 ms |
        TheGeneral4 |  14.06 ms | 0.2788 ms | 0.3723 ms |  13.82 ms |
         MiraiMann1 |  15.88 ms | 0.3153 ms | 0.5922 ms |  15.45 ms |

Original answer from before I added BenchmarkDotNet:

If I was trying this, I'd be tempted to have a look at the Span<T> work in 2.1 previews - the point being that a Span<T> can be sliced without allocating (and a string can be converted to a Span<char> without allocating); this would allow the string carving and parsing to be performed without any allocations. However, reducing allocations is not always quite the same thing as raw performance (although they are related), so to know if it was faster: you'd need to race your horses (i.e. compare them).

If Span<T> isn't an option: you can still do the same thing by tracking an int offset manually and just *never using SubString)

In either case (string or Span<char>): if your operation only needs to cope with a certain subset of representations of integers, I might be tempted to hand role a custom int.Parse equivalent that doesn't apply as many rules (cultures, alternative layouts, etc), and which works without needing a Substring - for example it could take a string and ref int offset, where the offset is updated to be where the parse stopped (either because it hit an operator or a space), and which worked.

Something like:

static class P
{
    static void Main()
    {
        string input = "14 + 2 * 32 / 60 + 43 - 7 + 3 - 1 + 0 * 7 + 87 - 32 / 34";

        var val = Evaluate(input);
        System.Console.WriteLine(val);
    }
    static int Evaluate(string expression)
    {
        int offset = 0;
        SkipSpaces(expression, ref offset);
        int value = ReadInt32(expression, ref offset);
        while(ReadNext(expression, ref offset, out char @operator, out int operand))
        {
            switch(@operator)
            {
                case '+': value = value + operand; break;
                case '-': value = value - operand; break;
                case '*': value = value * operand; break;
                case '/': value = value / operand; break;
            }
        }
        return value;
    }
    static bool ReadNext(string value, ref int offset,
        out char @operator, out int operand)
    {
        SkipSpaces(value, ref offset);

        if(offset >= value.Length)
        {
            @operator = (char)0;
            operand = 0;
            return false;
        }

        @operator = value[offset++];
        SkipSpaces(value, ref offset);

        if (offset >= value.Length)
        {
            operand = 0;
            return false;
        }
        operand = ReadInt32(value, ref offset);
        return true;
    }

    static void SkipSpaces(string value, ref int offset)
    {
        while (offset < value.Length && value[offset] == ' ') offset++;
    }
    static int ReadInt32(string value, ref int offset)
    {
        bool isNeg = false;
        char c = value[offset++];
        int i = (c - '0');
        if(c == '-')
        {
            isNeg = true;
            i = 0;
            // todo: what to do here if `-` is not followed by [0-9]?
        }

        while (offset < value.Length && (c = value[offset++]) >= '0' && c <= '9')
            i = (i * 10) + (c - '0');
        return isNeg ? -i : i;
    }
}

Next, I might consider whether it is worthwhile switching to unsafe to remove the double length checks. So I'd implement it both ways, and test it with something like BenchmarkDotNet to see whether it is worth it.


Edit: here is is with unsafe added and BenchmarkDotNet usage:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;

static class P
{
    static void Main()
    {
        var summary = BenchmarkRunner.Run<MyTests>();
        System.Console.WriteLine(summary);
    }

}
public class MyTests
{
    const string expression = "14 + 2 * 32 / 60 + 43 - 7 + 3 - 1 + 0 * 7 + 87 - 32 / 34";
    [Benchmark]
    public int Original() => EvalOriginal.Calc(expression);
    [Benchmark]
    public int NoSubStrings() => EvalNoSubStrings.Evaluate(expression);
    [Benchmark]
    public int NoSubStringsUnsafe() => EvalNoSubStringsUnsafe.Evaluate(expression);
}
static class EvalOriginal
{
    public static int Calc(string sInput)
    {
        int iCurrent = sInput.IndexOf(' ');
        int iResult = int.Parse(sInput.Substring(0, iCurrent));
        int iNext = 0;
        while ((iNext = sInput.IndexOf(' ', iCurrent + 4)) != -1)
        {
            iResult = Operate(iResult, sInput[iCurrent + 1], int.Parse(sInput.Substring((iCurrent + 3), iNext - (iCurrent + 2))));
            iCurrent = iNext;
        }
        return Operate(iResult, sInput[iCurrent + 1], int.Parse(sInput.Substring((iCurrent + 3))));
    }

    public static int Operate(int iReturn, char cOperator, int iOperant)
    {
        switch (cOperator)
        {
            case '+':
                return (iReturn + iOperant);
            case '-':
                return (iReturn - iOperant);
            case '*':
                return (iReturn * iOperant);
            case '/':
                return (iReturn / iOperant);
            default:
                throw new Exception("Error");
        }
    }
}
static class EvalNoSubStrings {
    public static int Evaluate(string expression)
    {
        int offset = 0;
        SkipSpaces(expression, ref offset);
        int value = ReadInt32(expression, ref offset);
        while (ReadNext(expression, ref offset, out char @operator, out int operand))
        {
            switch (@operator)
            {
                case '+': value = value + operand; break;
                case '-': value = value - operand; break;
                case '*': value = value * operand; break;
                case '/': value = value / operand; break;
                default: throw new Exception("Error");
            }
        }
        return value;
    }
    static bool ReadNext(string value, ref int offset,
        out char @operator, out int operand)
    {
        SkipSpaces(value, ref offset);

        if (offset >= value.Length)
        {
            @operator = (char)0;
            operand = 0;
            return false;
        }

        @operator = value[offset++];
        SkipSpaces(value, ref offset);

        if (offset >= value.Length)
        {
            operand = 0;
            return false;
        }
        operand = ReadInt32(value, ref offset);
        return true;
    }

    static void SkipSpaces(string value, ref int offset)
    {
        while (offset < value.Length && value[offset] == ' ') offset++;
    }
    static int ReadInt32(string value, ref int offset)
    {
        bool isNeg = false;
        char c = value[offset++];
        int i = (c - '0');
        if (c == '-')
        {
            isNeg = true;
            i = 0;
        }

        while (offset < value.Length && (c = value[offset++]) >= '0' && c <= '9')
            i = (i * 10) + (c - '0');
        return isNeg ? -i : i;
    }
}

static unsafe class EvalNoSubStringsUnsafe
{
    public static int Evaluate(string expression)
    {

        fixed (char* ptr = expression)
        {
            int len = expression.Length;
            var c = ptr;
            SkipSpaces(ref c, ref len);
            int value = ReadInt32(ref c, ref len);
            while (len > 0 && ReadNext(ref c, ref len, out char @operator, out int operand))
            {
                switch (@operator)
                {
                    case '+': value = value + operand; break;
                    case '-': value = value - operand; break;
                    case '*': value = value * operand; break;
                    case '/': value = value / operand; break;
                    default: throw new Exception("Error");
                }
            }
            return value;
        }
    }
    static bool ReadNext(ref char* c, ref int len,
        out char @operator, out int operand)
    {
        SkipSpaces(ref c, ref len);

        if (len-- == 0)
        {
            @operator = (char)0;
            operand = 0;
            return false;
        }
        @operator = *c++;
        SkipSpaces(ref c, ref len);

        if (len == 0)
        {
            operand = 0;
            return false;
        }
        operand = ReadInt32(ref c, ref len);
        return true;
    }

    static void SkipSpaces(ref char* c, ref int len)
    {
        while (len != 0 && *c == ' ') { c++;len--; }
    }
    static int ReadInt32(ref char* c, ref int len)
    {
        bool isNeg = false;
        char ch = *c++;
        len--;
        int i = (ch - '0');
        if (ch == '-')
        {
            isNeg = true;
            i = 0;
        }

        while (len-- != 0 && (ch = *c++) >= '0' && ch <= '9')
            i = (i * 10) + (ch - '0');
        return isNeg ? -i : i;
    }
}
like image 123
Marc Gravell Avatar answered Sep 20 '22 15:09

Marc Gravell


The following solution is a finite automaton. Calc(input) = O(n). For better performance, this solution does not use IndexOf, Substring, Parse, string concatenation, or repeated reading of value (fetching input[i] more than once)... just a character processor.

    static int Calculate1(string input)
    {
        int acc = 0;
        char last = ' ', operation = '+';

        for (int i = 0; i < input.Length; i++)
        {
            var current = input[i];
            switch (current)
            {
                case ' ':
                    if (last != ' ')
                    {
                        switch (operation)
                        {
                            case '+': acc += last - '0'; break;
                            case '-': acc -= last - '0'; break;
                            case '*': acc *= last - '0'; break;
                            case '/': acc /= last - '0'; break;
                        }

                        last = ' ';
                    }

                    break;

                case '0': case '1': case '2': case '3': case '4':
                case '5': case '6': case '7': case '8': case '9':
                    if (last == ' ') last = current;
                    else
                    {
                        var num = (last - '0') * 10 + (current - '0');
                        switch (operation)
                        {
                            case '+': acc += num; break;
                            case '-': acc -= num; break;
                            case '*': acc *= num; break;
                            case '/': acc /= num; break;
                        }
                        last = ' ';
                    }
                    break;

                case '+': case '-': case '*': case '/':
                    operation = current;
                    break;
            }
        }

        if (last != ' ')
            switch (operation)
            {
                case '+': acc += last - '0'; break;
                case '-': acc -= last - '0'; break;
                case '*': acc *= last - '0'; break;
                case '/': acc /= last - '0'; break;
            }

        return acc;
    }

And another implementation. It reads 25% less from the input. I expect that it has 25% better performance.

    static int Calculate2(string input)
    {
        int acc = 0, i = 0;
        char last = ' ', operation = '+';

        while (i < input.Length)
        {
            var current = input[i];
            switch (current)
            {
                case ' ':
                    if (last != ' ')
                    {
                        switch (operation)
                        {
                            case '+': acc += last - '0'; break;
                            case '-': acc -= last - '0'; break;
                            case '*': acc *= last - '0'; break;
                            case '/': acc /= last - '0'; break;
                        }

                        last = ' ';
                        i++;
                    }

                    break;

                case '0': case '1': case '2': case '3': case '4':
                case '5': case '6': case '7': case '8': case '9':
                    if (last == ' ')
                    {
                        last = current;
                        i++;
                    }
                    else
                    {
                        var num = (last - '0') * 10 + (current - '0');
                        switch (operation)
                        {
                            case '+': acc += num; break;
                            case '-': acc -= num; break;
                            case '*': acc *= num; break;
                            case '/': acc /= num; break;
                        }

                        last = ' ';
                        i += 2;
                    }
                    break;

                case '+': case '-': case '*': case '/':
                    operation = current;
                    i += 2;
                    break;
            }
        }

        if (last != ' ')
            switch (operation)
            {
                case '+': acc += last - '0'; break;
                case '-': acc -= last - '0'; break;
                case '*': acc *= last - '0'; break;
                case '/': acc /= last - '0'; break;
            }

        return acc;
    }

I implemented one more variant:

    static int Calculate3(string input)
    {
        int acc = 0, i = 0;
        var operation = '+';

        while (true)
        {
            var a = input[i++] - '0';
            if (i == input.Length)
            {
                switch (operation)
                {
                    case '+': acc += a; break;
                    case '-': acc -= a; break;
                    case '*': acc *= a; break;
                    case '/': acc /= a; break;
                }

                break;
            }

            var b = input[i];
            if (b == ' ') i++;
            else
            {
                a = a * 10 + (b - '0');
                i += 2;
            }

            switch (operation)
            {
                case '+': acc += a; break;
                case '-': acc -= a; break;
                case '*': acc *= a; break;
                case '/': acc /= a; break;
            }

            if (i >= input.Length) break;
            operation = input[i];
            i += 2;
        }

        return acc;
    }

Results in abstract points:

  • Calculate1 230
  • Calculate2 192
  • Calculate3 111
like image 40
Valentine Zakharenko Avatar answered Sep 22 '22 15:09

Valentine Zakharenko


NOTE

Per comments, this answer does not give a performant solution. I'll leave it here as there are points to be considered / which may be of interest to others finding this thread in future; but as people have said below, this is far from the most performant solution.


Original Answer

The .net framework already supplies a way to handle formulas given as strings:

var formula = "14 + 2 * 32 / 60 + 43 - 7 + 3 - 1 + 0 * 7 + 87 - 32 / 34";
var result = new DataTable().Compute(formula, null);
Console.WriteLine(result); //returns 139.125490196078

Initial feedback based on comments

Per the comments thread I need to point out some things:

Does this work the way I've described?

No; this follows the normal rules of maths.

I assume that your amended rules are to simplify writing code to handle them, rather than because you want to support a new branch of mathematics? If that's the case, I'd argue against that. People will expect things to behave in a certain way; so you'd have to ensure that anyone sending equations to your code was primed with the knowledge to expect the rules of this new-maths rather than being able to use their existing expectations.

There isn't an option to change the rules here; so if your requirement is to change the rules of maths, this won't work for you.

Is this the Fastest Solution

No. However it should perform well given MS spend a lot of time optimising their code, and so will likely perform faster than any hand-rolled code to do the same (though admittedly this code does a lot more than just support the four main operators; so it's not doing exactly the same).

Per MatthewWatson's specific comment (i.e. calling the DataTable constructor incurs a significant overhead) you'd want to create and then re-use one instance of this object. Depending on what your solution looks like there are various ways to do that; here's one:

interface ICalculator //if we use an interface we can easily switch from datatable to some other calulator; useful for testing, or if we wanted to compare different calculators without much recoding
{
    T Calculate<T>(string expression) where T: struct;
}
class DataTableCalculator: ICalculator 
{
    readonly DataTable dataTable = new DataTable();
    public DataTableCalculator(){}
    public T Calculate<T>(string expression) where T: struct =>
        (T)dataTable.Compute(expression, null);
}

class Calculator: ICalculator
{
    static ICalculator internalInstance;
    public Calculator(){}
    public void InitialiseCalculator (ICalculator calculator)
    {
        if (internalInstance != null)
        {
            throw new InvalidOperationException("Calculator has already been initialised");
        }
        internalInstance = calculator;
    }
    public T Calculate<T>(string expression) where T: struct =>
        internalInstance.Calculate<T>(expression);
}

//then we use it on our code
void Main()
{
    var calculator1 = new Calculator();
    calculator1.InitialiseCalculator(new DataTableCalculator());
    var equation = "14 + 2 * 32 / 60 + 43 - 7 + 3 - 1 + 0 * 7 + 87 - 32 / 34"; 
    Console.WriteLine(calculator1.Calculate<double>(equation)); //139.125490196078
    equation = "1 + 2 - 3 + 4";
    Console.WriteLine(calculator1.Calculate<int>(equation)); //4
    calculator1 = null;
    System.GC.Collect(); //in reality we'd pretty much never do this, but just to illustrate that our static variable continues fro the life of the app domain rather than the life of the instance
    var calculator2 = new Calculator();
    //calculator2.InitialiseCalculator(new DataTableCalculator()); //uncomment this and you'll get an error; i.e. the calulator should only be initialised once.
    equation = "1 + 2 - 3 + 4 / 5 * 6 - 7 / 8 + 9";
    Console.WriteLine(calculator2.Calculate<double>(equation)); //12.925
}

NB: The above solution uses a static variable; some people are against use of statics. For this scenario (i.e. where during the lifetime of the application you're only going to require one way of doing calculations) this is a legitimate use case. If you wanted to support switching the calculator at runtime a different approach would be required.


Update after Testing & Comparing

Having run some performance tests:

  • The DataTable.Compute method's biggest problem is that for equations the size of which you're dealing with it throws a StackOverflowException; (i.e. based on your equation generator's loop for (int i = 0; i < 1000000; i++).
  • For a single operation with a smaller equation (i < 1000), the compute method (including constructor and Convert.ToInt32 on the double result) takes almost 100 times longer.
  • for the single operation I also encountered overflow exceptions more often; i.e. because the result of the operations had pushed the value outside the bounds of supported data types...
  • Even if we move the constructor/initialise call outside of the timed area and remove the conversion to int (and run for thousands of iterations to get an average), your solution comes in 3.5 times faster than mine.

Link to the docs: https://msdn.microsoft.com/en-us/library/system.data.datatable.compute%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396

like image 21
JohnLBevan Avatar answered Sep 20 '22 15:09

JohnLBevan