I'am using ANTLR4 to generate Abstract Syntax Tree (AST) for PL/SQL codes. For some queries it works fine, but for some codes it generates an AST with only one node which is not right.
For example:
DECLARE
a RAW; -- migrate to BLOB
b LONG RAW; -- migrate to BLOB
c LONG VARCHAR; -- migrate to LOB
d LONG; -- migrate to LOB
x VARCHAR;
CURSOR mycur RETURN LONG; -- should flag
FUNCTION myfunc RETURN LONG RAW -- should flag
IS
z LONG RAW; -- should flag
BEGIN
RETURN z;
END;
BEGIN
SELECT mycol, CAST(col2 AS RAW) -- should flag
INTO a
FROM mytab
WHERE (b IS OF TYPE(LONG RAW, RAW, VARCHAR)); -- should flag
END;
CREATE TABLE tab (
a RAW, -- should flag
b LONG RAW, -- should flag
c LONG VARCHAR, -- should flag
d LONG, -- should flag
x VARCHAR
);
For this code it generates this AST:
(compilation_unit DECLARE a RAW ; b LONG RAW ; c LONG VARCHAR ; d LONG ; x VARCHAR ; CURSOR mycur RETURN LONG ; FUNCTION myfunc RETURN LONG RAW IS z LONG RAW ; BEGIN RETURN z ; END ; BEGIN SELECT mycol , CAST ( col2 AS RAW ) INTO a FROM mytab WHERE ( b IS OF TYPE ( LONG RAW , RAW , VARCHAR ) ) ; END ; CREATE TABLE tab ( a RAW , b LONG RAW , c LONG VARCHAR , d LONG , x VARCHAR ) ;)
which is merely the given code within compilation_unit node.
But if i give same code without declare part it gives good AST.
For this code:
FUNCTION myfunc RETURN LONG RAW -- should flag
IS
z LONG RAW; -- should flag
BEGIN
RETURN z;
END;
BEGIN
SELECT mycol, CAST(col2 AS RAW) -- should flag
INTO a
FROM mytab
WHERE (b IS OF TYPE(LONG RAW, RAW, VARCHAR)); -- should flag
END;
CREATE TABLE tab (
a RAW, -- should flag
b LONG RAW, -- should flag
c LONG VARCHAR, -- should flag
d LONG, -- should flag
x VARCHAR
);
it gives this AST:
(compilation_unit (unit_statement (create_function_body FUNCTION (function_name (id (id_expression (regular_id myfunc)))) RETURN (type_spec (datatype (native_datatype_element LONG RAW))) IS (declare_spec (variable_declaration (variable_name (id_expression (regular_id z))) (type_spec (datatype (native_datatype_element LONG RAW))) ;)) (body BEGIN (seq_of_statements (statement (return_statement RETURN (condition (expression (logical_and_expression (negated_expression (equality_expression (multiset_expression (relational_expression (compound_expression (concatenation (additive_expression (multiply_expression (datetime_expression (model_expression (unary_expression (atom (general_element (general_element_part (id_expression (regular_id z))))))))))))))))))))) ;) END) ;)) BEGIN (unit_statement (data_manipulation_language_statements (select_statement (subquery (subquery_basic_elements (query_block SELECT (selected_element (select_list_elements (expression (logical_and_expression (negated_expression (equality_expression (multiset_expression (relational_expression (compound_expression (concatenation (additive_expression (multiply_expression (datetime_expression (model_expression (unary_expression (atom (general_element (general_element_part (id_expression (regular_id mycol)))))))))))))))))))) , (selected_element (select_list_elements (expression (logical_and_expression (negated_expression (equality_expression (multiset_expression (relational_expression (compound_expression (concatenation (additive_expression (multiply_expression (datetime_expression (model_expression (unary_expression (standard_function CAST ( (concatenation_wrapper (concatenation (additive_expression (multiply_expression (datetime_expression (model_expression (unary_expression (atom (general_element (general_element_part (id_expression (regular_id col2)))))))))))) AS (type_spec (datatype (native_datatype_element RAW))) ))))))))))))))))) (into_clause INTO (variable_name (id_expression (regular_id a)))) (from_clause FROM (table_ref_list (table_ref (table_ref_aux (dml_table_expression_clause (tableview_name (id (id_expression (regular_id mytab))))))))) (where_clause WHERE (condition_wrapper (expression (logical_and_expression (negated_expression (equality_expression (multiset_expression (relational_expression (compound_expression (concatenation (additive_expression (multiply_expression (datetime_expression (model_expression (unary_expression (atom ( (expression_or_vector (expression (logical_and_expression (negated_expression (equality_expression (multiset_expression (relational_expression (compound_expression (concatenation (additive_expression (multiply_expression (datetime_expression (model_expression (unary_expression (atom (general_element (general_element_part (id_expression (regular_id b)))))))))))))) IS OF TYPE ( (type_spec (datatype (native_datatype_element LONG RAW))) , (type_spec (datatype (native_datatype_element RAW))) , (type_spec (datatype (native_datatype_element VARCHAR))) )))))) ))))) ; END ;)))))))))))))))))) unit_statement (unit_statement CREATE TABLE tab) (unit_statement (data_manipulation_language_statements (select_statement (subquery (subquery_basic_elements ( (subquery (subquery_basic_elements a RAW , b LONG RAW , c LONG VARCHAR , d LONG , x VARCHAR)) )) ;)))) <EOF>)
So it seems like some keywords(rules) like declare are missing in the PL/SQL grammar file. I used this plsql.g4 file with Antlr which is in the Antlr site.
Is there any way I can find an updated plsql.g4 file or do I need to add those rules myself?
The problem seems to actually be that plsql antlr-grammar doesn't support LONG VARCHAR datatype, in this part: "c LONG VARCHAR;"
It leads to declaration being misinterpreted as a procedure call messing up rest of the parsing.
I have opened an issue in the grammar github: https://github.com/antlr/grammars-v4/issues/2158
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With