Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I parse T-SQL to AST using the classes in Microsoft.SqlServer.Management.SqlParser

The class Parser only has the method Parse that returns a ParseResult, and seems I can do nothing with the ParseResult. How can I get the abstract syntax tree of my sql statement, or just the metadata tokens can be parsed from sql is fine.

like image 209
BartMao Avatar asked Dec 05 '15 08:12

BartMao


3 Answers

I did some research, and found that I can use reflection to generate the parsed information xml file using SqlScript.WriteXml. And here's is the sample code, I don't know if there is any better way.

var rst = Parser.Parse(File.ReadAllText(@"*.sql"));
var fieldInfo = rst.GetType().GetField("sqlScript", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField);
var script = fieldInfo.GetValue(rst);
var writer = XmlWriter.Create("*.xml");
script.GetType().InvokeMember("WriteXml", BindingFlags.NonPublic| BindingFlags.Instance | BindingFlags.InvokeMethod
    , null, script, new object[] { writer });
writer.Close();
like image 170
BartMao Avatar answered Oct 15 '22 06:10

BartMao


If you add Nuget package Microsoft.SqlServer.Management.SqlParser

you can use built-in parser.

using Microsoft.SqlServer.Management.SqlParser.Parser;
using Microsoft.SqlServer.Management.SqlParser.SqlCodeDom;
using static System.Console;

var parseResult = Parser.Parse("SELECT Id, Name FROM dbo.Customers WHERE State = 'MA'");
foreach (var batch in parseResult.Script.Batches)
{
    foreach (var statement in batch.Statements)
    {
        switch (statement)
        {
            case SqlSelectStatement selectStatement:
                ProcessSelectStatement(selectStatement);
                break;
            default:
                WriteLine("Unsupported statment. Printing inner XML");
                WriteLine(statement.Xml);
                break;
        }
    }
}

void ProcessSelectStatement(SqlSelectStatement selectStatement)
{
    var query = (SqlQuerySpecification)selectStatement.SelectSpecification.QueryExpression;
    var selectClause = query.SelectClause;
    WriteLine($"Select columns {string.Join(", ", selectClause.SelectExpressions.Select(_ => _.Sql))}");
    var fromClause = query.FromClause;
    WriteLine($"from tables {string.Join(", ", fromClause.TableExpressions.Select(_ => _.Sql))}");
    var whereClause = query.WhereClause;
    WriteLine($"where {whereClause.Expression.Sql}");
}
like image 2
codevision Avatar answered Oct 15 '22 06:10

codevision


I was excited you found the AST at all! Working with it directly requires use of dynamic variables to access the .Child collection of the objects in the internal Microsoft.SqlServer.Management.SqlParser.SqlCodeDom namespace.

Rather than calling WriteXml, I recommend accessing existing the Xml string property. This saves having to deal with issues caused by nesting SQL comments within XML comments (can't have -- inside an XML comment; -- becomes - -).

var rst = Parser.Parse(File.ReadAllText(@"*.sql"));
var script = rst.GetType().GetProperty("Script", BindingFlags.NonPublic | BindingFlags.Instance).GetValue(rst);
var xml = script.GetType().BaseType.GetProperty("Xml").GetValue(script) as String;

If you are genuinely willing to settle for just the metadata tokens, I found a PowerShell example here; it works out to something like this (boilerplate metadata factory included for completeness):

// using Microsoft.SqlServer.Management.SqlParser.Parser;
// ...

var sql = File.ReadAllText(@"*.sql");
var scanner = new Scanner(new ParseOptions());
int scannerState = 0;
scanner.SetSource(sql, 0);
var allTokens = new List<MSSQL_Token_JS>();
MSSQL_Token_JS curToken = null;
do
{
    curToken = MSSQL_Token_JS.GetNext(scanner, sql, ref scannerState);
    allTokens.Add(curToken);
}
while (curToken.Value != Tokens.EOF);

//...

public class MSSQL_Token_JS
{
    public readonly string SourceSQL;
    public readonly Tokens Value;
    public readonly string Text;
    public readonly int ScannerState;
    public readonly int Start;
    public readonly int End;
    public readonly bool IsPairMatch;
    public readonly bool IsExecAutoParamHelp;

    private MSSQL_Token_JS(string SourceSQL, int tokenId, int ScannerState, int Start, int End, bool IsPairMatch, bool IsExecAutoParamHelp)
    {
        this.SourceSQL = SourceSQL;
        this.Value = (Tokens)tokenId;
        if (this.Value != Tokens.EOF)
        {
            this.Text = SourceSQL.Substring(Start, End - Start + 1);
        }

        this.ScannerState = ScannerState;
        this.Start = Start;
        this.End = End;
        this.IsPairMatch = IsPairMatch;
        this.IsExecAutoParamHelp = IsExecAutoParamHelp;
    }

    public static MSSQL_Token_JS GetNext(Scanner scanner, string SourceSQL, ref int ScannerState)
    {
        int start, end;
        bool isPairMatch, isExecAutoParamHelp;
        int tokenId = scanner.GetNext(ref ScannerState, out start, out end, out isPairMatch, out isExecAutoParamHelp);
        return new MSSQL_Token_JS(SourceSQL, tokenId, ScannerState, start, end, isPairMatch, isExecAutoParamHelp);
    }

    public override string ToString()
    {
        return String.Format("{0}:{1}", this.Value, this.Text);
    }
}
like image 1
user423430 Avatar answered Oct 15 '22 06:10

user423430