I am creating an infix expression parser, an so I have to create a tokenizer. It works well, except for one thing: I do not now how to differentiate negative number from the "-" operator. For example, if I have: <pre class="prettyprint"><code>23 / -23 </code></pre> The tokens should be <code>23</code>, <code>/</code> and <code>-23</code>, but if I have an expression like <pre class="prettyprint"><code>23-22 </code></pre> Then the tokens should be <code>23</code>, <code>-</code> and <code>22</code>. I found a dirty workaround which is if I encounter a "-" followed by a number, I look at the previous character and if this character is a digit or a ')', I treat the "-" as an operator and not a number. Apart from being kind of ugly, it doesn't work for expressions like <pre class="prettyprint"><code>--56 </code></pre> where it gets the following tokens: <code>-</code> and <code>-56</code> where it should get <code>--56</code> Any suggestion?

In the first example the tokens should be <code>23</code>, <code>/</code>, <code>-</code> and <code>23</code>. The solution then is to evaluate the tokens according to the rules of associativity and precedence. <code>-</code> cannot bind to <code>/</code> but it can to 23, for example. If you encounter <code>--56</code>, is split into <code>-</code>,<code>-</code>,<code>56</code> and the rules take care of the problem. There is no need for special cases.

How to differentiate '-' operator from a negative number for a tokenizer

Tags:

c

parsing

token

I am creating an infix expression parser, an so I have to create a tokenizer. It works well, except for one thing: I do not now how to differentiate negative number from the "-" operator.

For example, if I have:

23 / -23

The tokens should be 23, / and -23, but if I have an expression like

23-22

Then the tokens should be 23, - and 22.

I found a dirty workaround which is if I encounter a "-" followed by a number, I look at the previous character and if this character is a digit or a ')', I treat the "-" as an operator and not a number. Apart from being kind of ugly, it doesn't work for expressions like

--56

where it gets the following tokens: - and -56 where it should get --56

Any suggestion?

812

asked Oct 23 '14 13:10

Brendan Rius

1 Answers

In the first example the tokens should be 23, /, - and 23.

The solution then is to evaluate the tokens according to the rules of associativity and precedence. - cannot bind to / but it can to 23, for example.

If you encounter --56, is split into -,-,56 and the rules take care of the problem. There is no need for special cases.

answered Oct 13 '22 21:10

2501

Related questions
                            
                                C/C++ Code Compiler in C#
                            
                                Coin flip simulation never exceeding a streak of 15 heads
                            
                                calloc() slower than malloc() & memset()
                            
                                Segmentation fault from a function that is not called at all
                            
                                setuid on an executable doesn't seem to work
                            
                                Can a pointer point to an address after 4GB?
                            
                                Is there a way to guarantee alignment of members of a malloc()-ed structs
                            
                                Faking an IO Error on Linux
                            
                                bit count function in K&R [closed]
                            
                                STDERR_FILENO undeclared on ubuntu
                            
                                Least significant bits in function pointer
                            
                                Guaranteed precision of sqrt function in C/C++
                            
                                Ubuntu - #include <curl/curl.h> no such file or directory
                            
                                sleep function in C11
                            
                                gcc on Windows: generated "a.exe" file vanishes
                            
                                Why the int type takes up 8 bytes in BSS section but 4 bytes in DATA section
                            
                                Custom malloc implementation
                            
                                Gcc inline assembly what does "'asm' operand has impossible constraints" mean?
                            
                                How do I scrape a web page using C?
                            
                                Passing an array as a function argument from within a function which takes it as an argument in C

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With