Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Code Golf: Mathematical expression evaluator (that respects PEMDAS)

C (465 characters)

#define F for(i=0;P-8;i+=2)
#define V t[i
#define P V+1]
#define S V+2]),K(&L,4),i-=2)
#define L V-2]
K(double*t,int i){for(*++t=4;*t-8;*++t=V])*++t=V];}M(double*t){int i,p,b;
F if(!P)for(p=1,b=i;i+=2,p;)P?P-1||--p||(P=8,M(t+b+2),K(t+b,i-b),i=b):++p;
F P-6||(L=pow(L,S;F P-2&&P-7||(L*=(P-7?V+2]:1/S;F P-4&&(L+=(P-5?V+2]:-S;
F L=V];}E(char*s,char*r){double t[99];char*e,i=2,z=0;for(;*s;i+=2)V]=
strtod(s,&e),P=z=e-s&&z-4&&z-1?s=e,4:*s++&7;P=8;M(t+2);sprintf(r,"%g",*t);}

The first five newlines are required, the rest are there just for readability. I've counted the first five newlines as one character each. If you want to measure it in lines, it was 28 lines before I removed all the whitespace, but that's a pretty meaningless number. It could have been anything from 6 lines to a million, depending on how I formatted it.

The entry point is E() (for "evaluate"). The first parameter is the input string, and the second parameter points to the output string, and must be allocated by the caller (as per usual C standards). It can handle up to 47 tokens, where a token is either an operator (one of "+-*/^()"), or a floating point number. Unary sign operators do not count as a separate token.

This code is loosely based on a project I did many years ago as an exercise. I took out all the error handling and whitespace skipping and retooled it using golf techniques. Below are the 28 lines, with enough formatting that I was able to write it, but probably not enough to read it. You'll want to #include <stdlib.h>, <stdio.h>, and <math.h> (or see note at the bottom).

See after the code for an explanation of how it works.

#define F for(i=0;P-8;i+=2)
#define V t[i
#define P V+1]
#define S V+2]),K(&L,4),i-=2)
#define L V-2]
K(double*t,int i){
    for(*++t=4;*t-8;*++t=V])
        *++t=V];
}
M(double*t){
    int i,p,b;
    F if(!P)
        for(p=1,b=i;i+=2,p;)
            P?P-1||--p||(P=8,M(t+b+2),K(t+b,i-b),i=b):++p;
    F P-6||(L=pow(L,S;
    F P-2&&P-7||(L*=(P-7?V+2]:1/S;
    F P-4&&(L+=(P-5?V+2]:-S;
    F L=V];
}
E(char*s,char*r){
    double t[99];
    char*e,i=2,z=0;
    for(;*s;i+=2)
        V]=strtod(s,&e),P=z=e-s&&z-4&&z-1?s=e,4:*s++&7;
    P=8;
    M(t+2);
    sprintf(r,"%g",*t);
}

The first step is to tokenize. The array of doubles contains two values for each token, an operator (P, because O looks too much like zero), and a value (V). This tokenizing is what is done in the for loop in E(). It also deals with any unary + and - operators, incorporating them into the constant.

The "operator" field of the token array can have one of the following values:

0: (
1: )
2: *
3: +
4: a floating-point constant value
5: -
6: ^
7: /
8: end of token string

This scheme was largely derived by Daniel Martin, who noticed that the last 3 bits were unique in the ASCII representation of each of the operators in this challenge.

An uncompressed version of E() would look something like this:

void Evaluate(char *expression, char *result){
    double tokenList[99];
    char *parseEnd;
    int i = 2, prevOperator = 0;
    /* i must start at 2, because the EvalTokens will write before the
     * beginning of the array.  This is to allow overwriting an opening
     * parenthesis with the value of the subexpression. */
    for(; *expression != 0; i += 2){
        /* try to parse a constant floating-point value */
        tokenList[i] = strtod(expression, &parseEnd);

        /* explanation below code */
        if(parseEnd != expression && prevOperator != 4/*constant*/ &&
           prevOperator != 1/*close paren*/){
            expression = parseEnd;
            prevOperator = tokenList[i + 1] = 4/*constant*/;
        }else{
            /* it's an operator */
            prevOperator = tokenList[i + 1] = *expression & 7;
            expression++;
        }
    }

    /* done parsing, add end-of-token-string operator */
    tokenList[i + 1] = 8/*end*/

    /* Evaluate the expression in the token list */
    EvalTokens(tokenList + 2); /* remember the offset by 2 above? */

    sprintf(result, "%g", tokenList[0]/* result ends up in first value */);
}

Since we're guaranteed valid input, the only reason the parsing would fail would be because the next token is an operator. If this happens, the parseEnd pointer will be the same as, tokenStart. We must also handle the case where parsing succeeded, but what we really wanted was an operator. This would occur for the addition and subtraction operators, unless a sign operator directly followed. In other words, given the expression "4-6", we want to parse it as {4, -, 6}, and not as {4, -6}. On the other hand, given "4+-6", we should parse it as {4, +, -6}. The solution is quite simple. If parsing fails OR the preceding token was a constant or a closing parenthesis (effectively a subexpression which will evaluate to a constant), then the current token is an operator, otherwise it's a constant.

After tokenizing is done, calculating and folding are done by calling M(), which first looks for any matched pairs of parentheses and processes the subexpressions contained within by calling itself recursively. Then it processes operators, first exponentiation, then multiplication and division together, and finally addition and subtraction together. Because well-formed input is expected (as specified in the challenge), it doesn't check for the addition operator explicitly, since it's the last legal operator after all the others are processed.

The calculation function, lacking golf compression, would look something like this:

void EvalTokens(double *tokenList){
    int i, parenLevel, parenStart;

    for(i = 0; tokenList[i + 1] != 8/*end*/; i+= 2)
        if(tokenList[i + 1] == 0/*open paren*/)
            for(parenLevel = 1, parenStart = i; i += 2, parenLevel > 0){
                if(tokenList[i + 1] == 0/*another open paren*/)
                    parenLevel++;
                else if(tokenList[i + 1] == 1/*close paren*/)
                    if(--parenLevel == 0){
                        /* make this a temporary end of list */
                        tokenList[i + 1] = 8;
                        /* recursively handle the subexpression */
                        EvalTokens(tokenList + parenStart + 2);
                        /* fold the subexpression out */
                        FoldTokens(tokenList + parenStart, i - parenStart);
                        /* bring i back to where the folded value of the
                         * subexpression is now */
                        i = parenStart;
                    }
            }

    for(i = 0; tokenList[i + 1] != 8/*end*/; i+= 2)
        if(tokenList[i + 1] == 6/*exponentiation operator (^)*/){
            tokenList[i - 2] = pow(tokenList[i - 2], tokenList[i + 2]);
            FoldTokens(tokenList + i - 2, 4);
            i -= 2;
        }
    for(i = 0; tokenList[i + 1] != 8/*end*/; i+= 2)
        if(tokenList[i + 1] == 2/*multiplication operator (*)*/ ||
           tokenList[i + 1] == 7/*division operator (/)*/){
            tokenList[i - 2] *=
                (tokenList[i + 1] == 2 ?
                    tokenList[i + 2] :
                    1 / tokenList[i + 2]);
            FoldTokens(tokenList + i - 2, 4);
            i -= 2;
        }
    for(i = 0; tokenList[i + 1] != 8/*end*/; i+= 2)
        if(tokenList[i + 1] != 4/*constant*/){
            tokenList[i - 2] +=
                (tokenList[i + 1] == 3 ?
                    tokenList[i + 2] :
                    -tokenList[i + 2]);
            FoldTokens(tokenList + i - 2, 4);
            i -= 2;
        }
    tokenList[-2] = tokenList[0];
    /* the compressed code does the above in a loop, equivalent to:
     *
     * for(i = 0; tokenList[i + 1] != 8; i+= 2)
     *     tokenList[i - 2] = tokenList[i];
     *
     * This loop will actually only iterate once, and thanks to the
     * liberal use of macros, is shorter. */
}

Some amount of compression would probably make this function easier to read.

Once an operation is performed, the operands and operator are folded out of the token list by K() (called through the macro S). The result of the operation is left as a constant in place of the folded expression. Consequently, the final result is left at the beginning of the token array, so when control returns to E(), it simply prints that to a string, taking advantage of the fact that the first value in the array is the value field of the token.

This call to FoldTokens() takes place either after an operation (^, *, /, +, or -) has been performed, or after a subexpression (surrounded by parentheses) has been processed. The FoldTokens() routine ensures that the result value has the correct operator type (4), and then copies the rest of the larger expression of the subexpression. For instance, when the expression "2+6*4+1" is processed, EvalTokens() first calculates 6*4, leaving the result in place of the 6 (2+24*4+1). FoldTokens() then removes the rest of the sub expression "24*4", leaving 2+24+1.

void FoldTokens(double *tokenList, int offset){
    tokenList++;
    tokenList[0] = 4; // force value to constant

    while(tokenList[0] != 8/*end of token string*/){
        tokenList[0] = tokenList[offset];
        tokenList[1] = tokenList[offset + 1];
        tokenList += 2;
    }
}

That's it. The macros are just there to replace common operations, and everything else is just golf-compression of the above.


strager insists that the code should include #include statements, as it will not function correctly without a proper forward declation of the strtod and pow and functions. Since the challenge asks for just a function, and not a complete program, I hold that this should not be required. However, forward declarations could be added at minimal cost by adding the following code:

#define D double
D strtod(),pow();

I would then replace all instances of "double" in the code with "D". This would add 19 characters to the code, bringing the total up to 484. On the other hand, I could also convert my function to return a double instead of a string, as did he, which would trim 15 characters, changing the E() function to this:

D E(char*s){
    D t[99];
    char*e,i=2,z=0;
    for(;*s;i+=2)
        V]=strtod(s,&e),P=z=e-s&&z-4&&z-1?s=e,4:*s++&7;
    P=8;
    M(t+2);
    return*t;
}

This would make the total code size 469 characters (or 452 without the forward declarations of strtod and pow, but with the D macro). It would even be possible to trim 1 more characters by requiring the caller to pass in a pointer to a double for the return value:

E(char*s,D*r){
    D t[99];
    char*e,i=2,z=0;
    for(;*s;i+=2)
        V=strtod(s,&e),P=z=e-s&&z-4&&z-1?s=e,4:*s++&7;
    P=8;
    M(t+2);
    *r=*t;
}

I'll leave it to the reader to decide which version is appropriate.


C#, 13 lines:

static string Calc(string exp)
{
    WebRequest request = WebRequest.Create("http://google.com/search?q=" + 
                                           HttpUtility.UrlDecode(exp));
    using (WebResponse response = request.GetResponse())
    using (Stream dataStream = response.GetResponseStream())
    using (StreamReader reader = new StreamReader(dataStream))
    {
        string r = reader.ReadToEnd();
        int start = r.IndexOf(" = ") + 3;
        int end = r.IndexOf("<", start);
        return r.Substring(start, end - start);
    }
}

This compresses down to about 317 characters:

static string C(string e){var q = WebRequest.Create("http://google.com/search?q="
+HttpUtility.UrlDecode(e));using (var p=q.GetResponse()) using (var s=
p.GetResponseStream()) using (var d = new StreamReader(dataStream)){var
r=d.ReadToEnd();var t=r.IndexOf(" = ") + 3;var e=r.IndexOf("<",t);return
r.Substring(t,e-t);}}

Thanks to Mark and P Daddy in the comments, is compresses to 195 characters:

string C(string f){using(var c=new WebClient()){var r=c.DownloadString
("http://google.com/search?q="+HttpUtility.UrlDecode(f));int s=r.IndexOf(
" = ")+3;return r.Substring(s,r.IndexOf("<",f)-s);}}

J

:[[/%^(:[[+-/^,&i|:[$[' ']^j+0__:k<3:]]

F#, 70 lines

Ok, I implement a monadic parser combinator library, and then use that library to solve this problem. All told it's still just 70 lines of readable code.

I assume exponentiation associates to the right, and the other operators associate to the left. Everything works on floats (System.Doubles). I did not do any error handling for bad inputs or divide-by-zero.

// Core Parser Library
open System
let Fail() = fun i -> None
type ParseMonad() =
    member p.Return x = fun i -> Some(x,i)
    member p.Bind(m,f) = fun i -> 
        match m i with
        | Some(x,i2) -> f x i2
        | None -> None
let parse = ParseMonad()
let (<|>) p1 p2 = fun i -> 
    match p1 i with
    | Some r -> Some(r)
    | None -> p2 i
let Sat pred = fun i -> 
    match i with
    | [] -> None
    | c::cs -> if pred c then Some(c, cs) else None
// Auxiliary Parser Library
let Digit = Sat Char.IsDigit
let Lit (c : char) r = 
    parse { let! _ = Sat ((=) c)
            return r }
let Opt p = p <|> parse { return [] }
let rec Many p = Opt (Many1 p)
and Many1 p = parse { let! x = p
                      let! xs = Many p
                      return x :: xs }
let Num = parse {
    let! sign = Opt(Lit '-' ['-'])
    let! beforeDec = Many Digit
    let! rest = parse { let! dec = Lit '.' '.'
                        let! afterDec = Many Digit
                        return dec :: afterDec } |> Opt
    let s = new string(List.concat([sign;beforeDec;rest])
                       |> List.to_array) 
    match(try Some(float s) with e -> None)with
    | Some(r) -> return r
    | None -> return! Fail() }
let Chainl1 p op = 
    let rec Help x = parse { let! f = op
                             let! y = p
                             return! Help (f x y) } 
                     <|> parse { return x }
    parse { let! x = p
            return! Help x }
let rec Chainr1 p op =
    parse { let! x = p
            return! parse { let! f = op
                            let! y = Chainr1 p op
                            return f x y }
                    <|> parse { return x } }
// Expression grammar of this code-golf question
let AddOp = Lit '+' (fun x y -> 0. + x + y) 
        <|> Lit '-' (fun x y -> 0. + x - y)
let MulOp = Lit '*' (fun x y -> 0. + x * y) 
        <|> Lit '/' (fun x y -> 0. + x / y)
let ExpOp = Lit '^' (fun x y -> Math.Pow(x,y))
let rec Expr = Chainl1 Term AddOp
and Term = Chainl1 Factor MulOp
and Factor = Chainr1 Part ExpOp
and Part = Num <|> Paren
and Paren = parse { do! Lit '(' ()
                    let! e = Expr
                    do! Lit ')' ()
                    return e }
let CodeGolf (s:string) =
    match Expr(Seq.to_list(s.ToCharArray())) with
    | None -> "bad input"
    | Some(r,_) -> r.ToString()
// Examples
printfn "%s" (CodeGolf "1.1+2.2+10^2^3") // 100000003.3
printfn "%s" (CodeGolf "10+3.14/2")      // 11.57
printfn "%s" (CodeGolf "(10+3.14)/2")    // 6.57
printfn "%s" (CodeGolf "-1^(-3*4/-6)")   // 1
printfn "%s" (CodeGolf "-2^(2^(4-1))")   // 256 
printfn "%s" (CodeGolf "2*6/4^2*4/3")    // 1

The parser representation type is

type P<'a> = char list -> option<'a * char list>

by the way, a common one for non-error-handling parsers.


Recursive descent parser in PARLANSE, a C-like langauge with LISP syntax: [70 lines, 1376 characters not counting indent-by-4 needed by SO] EDIT: Rules changed, somebody insisted on floating point numbers, fixed. No library calls except the float conversion, input and print. [now 94 lines, 1825 characters]

(define main (procedure void)
   (local
      (;; (define f (function float void))
          (= [s string] (append (input) "$"))
          (= [i natural] 1)

         (define S (lambda f
            (let (= v (P))
               (value (loop
                          (case s:i)
                            "+" (;; (+= i) (+= v (P) );;
                            "-" (;; (+= i) (-= v (P) );;
                            else (return v)
                          )case
                       )loop
                  v
              )value
         )define

         (define P (lambda f
            (let (= v (T))
               (value (loop
                          (case s:i)
                            "*" (;; (+= i) (= v (* v (T)) );;
                            "/" (;; (+= i) (= v (/ v (T)) );;
                            else (return v)
                          )case
                       )loop
                  v
              )value
         )define

         (define T (lambda f
            (let (= v (O))
               (value (loop
                          (case s:i)
                            "^" (;; (+= i) (= v (** v (T)) );;
                            else (return v)
                          )case
                       )loop
                  v
              )value
         )define

         (define O (lambda f
           (let (= v +0)
            (value 
               (case s:i)
                  "(" (;; (+= i) (= v (E)) (+= i) );;
                  "-" (;; (+= i) (= v (- 0.0 (O))) );;
               else (= v (StringToFloat (F))
          )value
          v
        )let
     )define

     (define F (lambda f)
        (let (= n (N))
             (value
              (;; (ifthen (== s:i ".")
                     (;; (+= i)
                         (= n (append n "."))
                         (= n (concatenate n (N)))
                     );;
                  )ifthen
                  (ifthen (== s:i "E")
                     (;; (+= i)
                         (= n (append n "E"))
                         (ifthen (== s:i "-")
                         (;; (+= i)
                             (= n (append n "-"))
                             (= n (concatenate n (N)))
                         );;
                     );;
                  )ifthen
              );;
              n
         )let
     )define               

     (define N (lambda (function string string)
        (case s:i
            (any "0" "1" "2" "3" "4" "5" "6" "7" "8" "9")
               (value (+= i)
                      (append ? s:(-- i))
               )value
            else ?
        )case
     )define

      );;
      (print (S))
   )local
)define

Assumes a well-formed expression, float numbers with at least one leading digit, exponents optional as Enn or E-nnn. Not compiled and run.

Pecularities: the definition f is essentially signature typedef. The lambdas are the parsing functions, one per grammar rule. A function F is called by writing (F args). PARLANSE functions are lexically scoped, so each function has implicit access to the expression s to be evaluated and a string scanning index i.

The grammar implemented is:

E = S $ ;
S = P ;
S = S + P ;
P = T ;
P = P * T ;  
T = O ;
T = O ^ T ;
O = ( S ) ;
O = - O ;
O = F ;
F = digits {. digits} { E {-} digits} ;