Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reusing MySQL parser

I'm working on SQL intrusion detection system (IDS) and I need do parse incoming SQL queries. Writing own SQL parser is a long term task and it will never exactly reflect the logic used in native parser. I found out that MySQL has a lexical analyzer with main source file sql/sql_lex.cc and a syntax analyzer built with bison from sql/sql_yacc.y. I am really interested in reusing this robust solutions. I am building my IDS in C/C++, so I am looking for some way to connect MySQL parser with my detection system.

I was wondering if It is possible to reuse the MySQL parser (lexical+syntax analyzer) to get the structure of SQL query in some logical form e.g. syntax tree. Would it be possible? Are there some related text, tutorials or projects?

Thanks

like image 329
bittomix Avatar asked Apr 13 '14 02:04

bittomix


2 Answers

I have finished the first version of my IDS as a part of my bachelor project. It is implemented as plugin for MySQL.

I will list my main sources for understanding the MySQL internals bellow. Then I shortly describe the approach I used in my IDS.

MySQL documentation texts

  • I found books Expert MySQL by Charles Bell and Understanding MySQL Internals by Sasha Pachev (as user3822447 wrote) to be very good entry point for understanding the internals of MySQL.
  • MySQL 5.1 Plugin Development by Andrew Hutchings & Sergei Golubchik is also very useful.
  • The MySQL Internals Manual also contains some basic information good to start with.
  • After all reading I did som debugging (using VS) and discovered how the query tree structure looks like.

My solution for IDS

The source code of my solution can be found at sourceforge. I'm planning to document it little more in its wiki.

The main entry point is the audit_ids_notify() function in audit_ids.cc. The plugin takes query tree generated by internal MySQL parser a makes simplified version of it (to save memory). Then it does anomally detection - it has a list of known query tree structures and keeps some statistical information about each parametrizable part of each query tree structure. The output is written into special log file in the MySQL data directory.

I tried to make the solution modular and extendable. The initial version is kind of demostration and the performance is not optimized especially in SQL storage module.

MySQL plugin type

I identified 2 possible approaches and used the first one.

  1. audit plugin
    • The type of wrapper in my solution plugin is audit plugin.
    • I used this type of plugins despite being used to report server operations (e.g. to log queries or errors).
    • I chose this type of plugin because I found out that this is the only native supported plugin which is called when the query tree is after the completed (i.e. parsed) and before it is freed from the memory (for MySQL 5.6.17).
    • Disadvantage: the above is not fully guaranteed in future versions of MySQL, but in my opinion this shouldn't change in the close future.
    • Advantage: the MySQL does not need to recompiled. It is enough to build and install the plugin.

  1. query-rewrite plugin
    • There is also an alternative approach doing this using non-native plugin type query-rewrite. It provised plugin API for modifying the query thus also for reading it.
    • Disadvantage: to support this plugin API the MySQL server must be recompiled with the API. I think may become a part of MySQL production distribution.
    • Advantage: plugin type designed for reading/rewriting the internal query tree.

If there are some questions/problems related to this topic I could answer feel free to ask ;)

like image 199
bittomix Avatar answered Oct 30 '22 10:10

bittomix


I believe that it is possible. Try an advanced MySQL internals book such as "Expert MySQL" by Charles Bell or "Understanding MySQL Internals" by Sasha Pachev. MySQL uses a custom hand-built lexer and a generic Bison-compatible parser with which their lexer is compatible.

Aside from that, you may find a simpler solution than parsing the query, for example:

  • Strategy #1: Throw away the query and just look at the contents of the strings inside the query. Look for possible attack vectors such as SQL keywords. This could detect attack attempts.
  • Strategy #2: Throw away all user input and make a list of the rest of the query contents. Make a list of all your query patterns of keywords and compare them against each other. Look for queries with anomalous structure that indicate that someone successfully modified the query.

I am no SQL guru but the most basic strategy is simply to use parameterized queries and ignore penetration attempts. Most such attempts on the Internet are generic, random queries designed to probe for obvious weakness and can be safely ignored if you follow basic security practice everywhere.

like image 1
user3822447 Avatar answered Oct 30 '22 10:10

user3822447