Is it bad idea using regex to tokenize string for lexer?

Tags:

I'm not sure how am I gonna tokenize source for lexer. For now, I only can think of using regex to parse string into array with given rule (identifier, symbols such as +,-, etc).

For instance,

begin x:=1;y:=2;

then I want to tokenize word, variable (x, y in this case) and each symbol (:,=,;).

498

asked Feb 07 '13 21:02

REALFREE

1 Answers

Using regexes is a common way of implementing a lexer. If you don't want to use them then you'll sort of end up implementing some regex parts yourself anyway.

Although performance-wise it can be more efficient if you do it yourself, it isn't a must.

148

answered Nov 29 '22 09:11

Oak

Related questions
                            
                                vim search replace regex
                            
                                Perl regex replace numbers with themselves, just one higher
                            
                                unix: how to tell if a string matches a regex
                            
                                Replace one occurrence with regexp
                            
                                java - after splitting a string, what is the first element in the array?
                            
                                Scala pattern matching Java enum value
                            
                                How to match all alphanumeric except underscore on Python
                            
                                Regular expression to get URL in string swift with Capitalized symbols
                            
                                Java multiple replace on a single pass
                            
                                ruby gsub new line characters
                            
                                Why is a compiled python regex slower?
                            
                                Why do unicode quotes appear around a regex capture in perl6?
                            
                                What is the meaning of (/^\s+|\s+$/gm) in JavaScript?
                            
                                Spacy custom tokenizer to include only hyphen words as tokens using Infix regex
                            
                                Match all URLs in string and return in array in JavaScript
                            
                                C# foreach loop with key value
                            
                                Mercurial gives "invalid pattern" error for simple GLOB syntax
                            
                                Replace HTML entities with regular expression [closed]
                            
                                Replace string inside tags?
                            
                                merge two regular expressions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is it bad idea using regex to tokenize string for lexer?

Tags:

regex

tokenize

lexer

REALFREE

People also ask

1 Answers

Oak

Recent Activity

Donate For Us