Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is there no decent sql parser? [closed]

I'm currently doing some oracle sql analysis and often encounter valid sql statement that can not be properly parsed by various sql parsers. Either they fail parsing it, or their abstract syntax tree generated is somehow messed up. It seems that the only parser that can really handle oracle sql is their own one that is not publicly available and cannot be obtained as a stand-alone parser.

I'm aware that there are different sql grammars and complying to all may not be possible. But even parsers that claim to be Oracle SQL parsers do not succeed in all cases.

What do you think are the main reasons that make it hard to implement sql parsers in general or oracle sql parsers in particular?

Best, Will

like image 243
Will Avatar asked May 16 '11 08:05

Will


People also ask

What is SQL parser?

The SQL Parser parses a SQL query in a string field. When parsing a query, the processor generates fields based on the fields defined in the SQL query and specifies the CRUD operation, table, and schema information in record header attributes.

How do you parse a SQL query in Python?

format(first, reindent=True, keyword_case='upper')) SELECT * FROM foo; >>> # Parsing a SQL statement: >>> parsed = sqlparse. parse('select * from foo')[0] >>> parsed. tokens [<DML 'select' at 0x7f22c5e15368>, <Whitespace ' ' at 0x7f22c5e153b0>, <Wildcard '*' … ] >>>

What is parser and optimizer in SQL?

The MySQL server receives queries in the SQL format. Once a query is received, it first needs to be parsed, which involves translating it from what is essentially a textual format into a combination of internal binary structures that can be easily manipulated by the optimizer.


1 Answers

Good parsers are hard to write. That starts with the code generator for the parser code (which usually eats some (E)BNF-like syntax which has its own limitations).

Error handling in parsers is a research topic of its own. This is not only about detecting errors but also giving useful information what could be wrong and how to solve it. Some parsers don't even offer location information ("error happened at line/column").

Next, you have SQL which means "Structured Query Language", not "Standard Query Language". There is a SQL standard, even several, but you won't find a single database which implements any of them.

Oracle grudgingly offers VARCHAR but you better use VARCHAR2. Some databases offer recursive/tree-like queries. All of them use their own, special syntax for this. Joining is defined pretty clearly in the standard (join, left join, ...) but why bother if you can use +?

On top of that, for every database version, new features are added to the grammar.

So while you could write a parser that can read the standard cases, writing a parser that can support all the features which all the databases around the globe offer, is nearly impossible. And I'm not even talking about the bugs which you can encounter in these parsers.

One solution would be if all database vendors would publish the grammar files. But these are crown jewels (IP). So you should be happy that you can use them without having to pay a license fee per parsed character * number of CPUs.

like image 169
Aaron Digulla Avatar answered Oct 07 '22 21:10

Aaron Digulla