I'm working on SQL intrusion detection system (IDS) and I need do parse incoming SQL queries. Writing own SQL parser is a long term task and it will never exactly reflect the logic used in native parser.
I found out that MySQL has a lexical analyzer with main source file sql/sql_lex.cc
and a syntax analyzer built with bison from sql/sql_yacc.y
. I am really interested in reusing this robust solutions. I am building my IDS in C/C++, so I am looking for some way to connect MySQL parser with my detection system.
I was wondering if It is possible to reuse the MySQL parser (lexical+syntax analyzer) to get the structure of SQL query in some logical form e.g. syntax tree. Would it be possible? Are there some related text, tutorials or projects?
Thanks
I have finished the first version of my IDS as a part of my bachelor project. It is implemented as plugin for MySQL.
I will list my main sources for understanding the MySQL internals bellow. Then I shortly describe the approach I used in my IDS.
The source code of my solution can be found at sourceforge. I'm planning to document it little more in its wiki.
The main entry point is the audit_ids_notify()
function in audit_ids.cc
. The plugin takes query tree generated by internal MySQL parser a makes simplified version of it (to save memory). Then it does anomally detection - it has a list of known query tree structures and keeps some statistical information about each parametrizable part of each query tree structure. The output is written into special log file in the MySQL data directory.
I tried to make the solution modular and extendable. The initial version is kind of demostration and the performance is not optimized especially in SQL storage module.
I identified 2 possible approaches and used the first one.
If there are some questions/problems related to this topic I could answer feel free to ask ;)
I believe that it is possible. Try an advanced MySQL internals book such as "Expert MySQL" by Charles Bell or "Understanding MySQL Internals" by Sasha Pachev. MySQL uses a custom hand-built lexer and a generic Bison-compatible parser with which their lexer is compatible.
Aside from that, you may find a simpler solution than parsing the query, for example:
I am no SQL guru but the most basic strategy is simply to use parameterized queries and ignore penetration attempts. Most such attempts on the Internet are generic, random queries designed to probe for obvious weakness and can be safely ignored if you follow basic security practice everywhere.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With