My objective is to parse like Python does with strings.
Question: How to write a lex to support the following:
"string..."
'string...'
"""multi line string \n \n end"""
'''multi line string \n \n end'''
Some code:
states = ( ('string', 'exclusive'), ) # Strings def t_begin_string(self, t): r'(\'|(\'{3})|\"|(\"{3}))' t.lexer.push_state('string') def t_string_end(self, t): r'(\'|(\'{3})|\"|(\"{3}))' t.lexer.pop_state() def t_string_newline(self, t): r'\n' t.lexer.lineno += 1 def t_string_error(self, t): print("Illegal character in string '%s'" % t.value[0]) t.lexer.skip(1)
My current idea is to create 4 unique states that will match the 4 different string cases, but I'm wondering if there's a better approach.
Thanks for your help!
Both single (' ') and double (" ") quotes are used to represent a string in Javascript. Choosing a quoting style is up to you and there is no special semantics for one style over the other. Nevertheless, it is important to note that there is no type for a single character in javascript, everything is always a string!
Spanning strings over multiple lines can be done using python's triple quotes. It can also be used for long comments in code. Special characters like TABs, verbatim or NEWLINEs can also be used within the triple quotes. As the name suggests its syntax consists of three consecutive single or double-quotes.
@Denilson, XML (and therefore XHTML) allows both single and double quotes.
In Python, a string ( str ) is created by enclosing text in single quotes ' , double quotes " , and triple quotes ( ''' , """ ). It is also possible to convert objects of other types to strings with str() . This article describes the following contents.
isolate the common string to make a single state and try to build an automaton with lesser states however u can have a look on py lex yacc if u are not worried about using an external library that makes ur job easier
However u need basics of lex yacc ///the sample code is as shown
tokens = (
'NAME','NUMBER',
'PLUS','MINUS','TIMES','DIVIDE','EQUALS',
'LPAREN','RPAREN',
)
enter code here
# Tokens
t_PLUS = r'\+'
t_MINUS = r'-'
t_TIMES = r'\*'
t_DIVIDE = r'/'
t_EQUALS = r'='
t_LPAREN = r'\('
t_RPAREN = r'\)'
t_NAME = r'[a-zA-Z_][a-zA-Z0-9_]*'
def t_NUMBER(t):
r'\d+'
try:
t.value = int(t.value)
except ValueError:
print("Integer value too large %d", t.value)
t.value = 0
return t
# Ignored characters
t_ignore = " \t"
def t_newline(t):
r'\n+'
t.lexer.lineno += t.value.count("\n")
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
# Build the lexer
import ply.lex as lex
lex.lex()
# Parsing rules
precedence = (
('left','PLUS','MINUS'),
('left','TIMES','DIVIDE'),
('right','UMINUS'),
)
# dictionary of names
names = { }
def p_statement_assign(t):
'statement : NAME EQUALS expression'
names[t[1]] = t[3]
def p_statement_expr(t):
'statement : expression'
print(t[1])
def p_expression_binop(t):
'''expression : expression PLUS expression
| expression MINUS expression
| expression TIMES expression
| expression DIVIDE expression'''
if t[2] == '+' : t[0] = t[1] + t[3]
elif t[2] == '-': t[0] = t[1] - t[3]
elif t[2] == '*': t[0] = t[1] * t[3]
elif t[2] == '/': t[0] = t[1] / t[3]
def p_expression_uminus(t):
'expression : MINUS expression %prec UMINUS'
t[0] = -t[2]
def p_expression_group(t):
'expression : LPAREN expression RPAREN'
t[0] = t[2]
def p_expression_number(t):
'expression : NUMBER'
t[0] = t[1]
def p_expression_name(t):
'expression : NAME'
try:
t[0] = names[t[1]]
except LookupError:
print("Undefined name '%s'" % t[1])
t[0] = 0
def p_error(t):
print("Syntax error at '%s'" % t.value)
import ply.yacc as yacc
yacc.yacc()
while 1:
try:
s = input('calc > ') # Use raw_input on Python 2
except EOFError:
break
yacc.parse(s)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With