Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to formulate an IQueryable to query a recursive database table?

I have a database table like this:

Entity
---------------------
ID        int      PK
ParentID  int      FK
Code      varchar
Text      text

The ParentID field is a foreign key with another record in the same table (recursive). So the structure represents a Tree.

I'm trying to write a method to query this table and get 1 specific Entity based on a path. A path would be a string representing the Code properties of the Entity and the parent Entities. So an example path would be "foo/bar/baz" which means the one specific Entity of which the Code == "baz", the parent's Code == "bar" and the parent of the parent's Code == "foo".

My attempt:

public Entity Single(string path)
{
 string[] pathParts = path.Split('/');
 string code = pathParts[pathParts.Length -1];

 if (pathParts.Length == 1)
  return dataContext.Entities.Single(e => e.Code == code && e.ParentID == 0);

 IQueryable<Entity> entities = dataContext.Entities.Where(e => e.Code == code);
 for (int i = pathParts.Length - 2; i >= 0; i--)
 {
  string parentCode = pathParts[i];
  entities = entities.Where(e => e.Entity1.Code == parentCode); // incorrect
 }

 return entities.Single();
}

I know this isn't correct because the Where inside the forloop just adds more conditions to the current Entity instead of the parent Entity, but how do I correct this? In words I would like the for-loop to say "and the parent's code must be x and the parent of that parent's code must be y, and the parent of that parent of that parent's code must be z .... etc". Besides that, for performance reasons I'd like it to be one IQueryable so there will be just 1 query going to the database.

like image 667
Bazzz Avatar asked Nov 17 '12 10:11

Bazzz


3 Answers

How to formulate an IQueryable to query a recursive database table? I'd like it to be one IQueryable so there will be just 1 query going to the database.

I don't think traversing an hierarchical table using a single translated query is currently possible with Entity Framework. The reason is you'll need to implement either a loop or recursion and to my best knowledge neither can be translated into an EF object store query.

UPDATE

@Bazzz and @Steven got me thinking and I have to admit I was completely wrong: it is possible and quite easy to construct an IQueryable for these requirements dynamically.

The following function can be called recursively to build up the query:

public static IQueryable<TestTree> Traverse(this IQueryable<TestTree> source, IQueryable<TestTree> table, LinkedList<string> parts)
{
    var code = parts.First.Value;
    var query = source.SelectMany(r1 => table.Where(r2 => r2.Code == code && r2.ParentID == r1.ID), (r1, r2) => r2);
    if (parts.Count == 1)
    {
        return query;
    }
    parts.RemoveFirst();
    return query.Traverse(table, parts);
}

The root query is a special case; here's a working example of calling Traverse:

using (var context = new TestDBEntities())
{
    var path = "foo/bar/baz";
    var parts = new LinkedList<string>(path.Split('/'));
    var table = context.TestTrees;

    var code = parts.First.Value;
    var root = table.Where(r1 => r1.Code == code && !r1.ParentID.HasValue);
    parts.RemoveFirst();

    foreach (var q in root.Traverse(table, parts))
        Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code);
}

The DB is queried only once with this generated code:

exec sp_executesql N'SELECT 
[Extent3].[ID] AS [ID], 
[Extent3].[ParentID] AS [ParentID], 
[Extent3].[Code] AS [Code]
FROM   [dbo].[TestTree] AS [Extent1]
INNER JOIN [dbo].[TestTree] AS [Extent2] ON ([Extent2].[Code] = @p__linq__1) AND ([Extent2].[ParentID] = [Extent1].[ID])
INNER JOIN [dbo].[TestTree] AS [Extent3] ON ([Extent3].[Code] = @p__linq__2) AND ([Extent3].[ParentID] = [Extent2].[ID])
WHERE ([Extent1].[Code] = @p__linq__0) AND ([Extent1].[ParentID] IS NULL)',N'@p__linq__1 nvarchar(4000),@p__linq__2 nvarchar(4000),@p__linq__0 nvarchar(4000)',@p__linq__1=N'bar',@p__linq__2=N'baz',@p__linq__0=N'foo'

And while I like the execution plan of the raw query (see below) a bit better, the approach is valid and perhaps useful.

End of UPDATE

Using IEnumerable

The idea is to grab the relevant data from the table in one go and then do the traversing in the application using LINQ to Objects.

Here's a recursive function that will get a node from a sequence:

static TestTree GetNode(this IEnumerable<TestTree> table, string[] parts, int index, int? parentID)
{
    var q = table
        .Where(r => 
             r.Code == parts[index] && 
             (r.ParentID.HasValue ? r.ParentID == parentID : parentID == null))
        .Single();
    return index < parts.Length - 1 ? table.GetNode(parts, index + 1, q.ID) : q;
}

You can use like this:

using (var context = new TestDBEntities())
{
    var path = "foo/bar/baz";
    var q = context.TestTrees.GetNode(path.Split('/'), 0, null);
    Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code);
}

This will execute one DB query for each path part, so if you want the DB to only be queried once, use this instead:

using (var context = new TestDBEntities())
{
    var path = "foo/bar/baz";
    var q = context.TestTrees
        .ToList()
        .GetNode(path.Split('/'), 0, null);
    Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code);
}

An obvious optimization is to exclude the codes not present in our path before traversing:

using (var context = new TestDBEntities())
{
    var path = "foo/bar/baz";
    var parts = path.Split('/');
    var q = context
        .TestTrees
        .Where(r => parts.Any(p => p == r.Code))
        .ToList()
        .GetNode(parts, 0, null);
    Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code);
}

This query should be fast enough unless most of your entities have similar codes. However, if you absolutely need top performance, you could use raw queries.

SQL Server Raw Query

For SQL Server a CTE-based query would probably be best:

using (var context = new TestDBEntities())
{
    var path = "foo/bar/baz";
    var q = context.Database.SqlQuery<TestTree>(@"
        WITH Tree(ID, ParentID, Code, TreePath) AS
        (
            SELECT ID, ParentID, Code, CAST(Code AS nvarchar(512)) AS TreePath
            FROM dbo.TestTree
            WHERE ParentID IS NULL

            UNION ALL

            SELECT TestTree.ID, TestTree.ParentID, TestTree.Code, CAST(TreePath + '/' + TestTree.Code AS nvarchar(512))
            FROM dbo.TestTree
            INNER JOIN Tree ON Tree.ID = TestTree.ParentID
        )
        SELECT * FROM Tree WHERE TreePath = @path", new SqlParameter("path", path)).Single();
    Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code);
}

Limiting data by the root node is easy and might be quite useful performance-wise:

using (var context = new TestDBEntities())
{
    var path = "foo/bar/baz";
    var q = context.Database.SqlQuery<TestTree>(@"
        WITH Tree(ID, ParentID, Code, TreePath) AS
        (
            SELECT ID, ParentID, Code, CAST(Code AS nvarchar(512)) AS TreePath
            FROM dbo.TestTree
            WHERE ParentID IS NULL AND Code = @parentCode

            UNION ALL

            SELECT TestTree.ID, TestTree.ParentID, TestTree.Code, CAST(TreePath + '/' + TestTree.Code AS nvarchar(512))
            FROM dbo.TestTree
            INNER JOIN Tree ON Tree.ID = TestTree.ParentID
        )
        SELECT * FROM Tree WHERE TreePath = @path", 
            new SqlParameter("path", path),
            new SqlParameter("parentCode", path.Split('/')[0]))
            .Single();
    Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code);
}

Footnotes

All of this was tested with .NET 4.5, EF 5, SQL Server 2012. Data setup script:

CREATE TABLE dbo.TestTree
(
    ID int not null IDENTITY PRIMARY KEY,
    ParentID int null REFERENCES dbo.TestTree (ID),
    Code nvarchar(100)
)
GO

INSERT dbo.TestTree (ParentID, Code) VALUES (null, 'foo')
INSERT dbo.TestTree (ParentID, Code) VALUES (1, 'bar')
INSERT dbo.TestTree (ParentID, Code) VALUES (2, 'baz')
INSERT dbo.TestTree (ParentID, Code) VALUES (null, 'bla')
INSERT dbo.TestTree (ParentID, Code) VALUES (1, 'blu')
INSERT dbo.TestTree (ParentID, Code) VALUES (2, 'blo')
INSERT dbo.TestTree (ParentID, Code) VALUES (null, 'baz')
INSERT dbo.TestTree (ParentID, Code) VALUES (1, 'foo')
INSERT dbo.TestTree (ParentID, Code) VALUES (2, 'bar')

All examples in my test returned the 'baz' entity with ID 3. It's assumed that the entity actually exists. Error handling is out of scope of this post.

UPDATE

To address @Bazzz's comment, the data with paths is shown below. Code is unique by level, not globally.

ID   ParentID    Code      TreePath
---- ----------- --------- -------------------
1    NULL        foo       foo
4    NULL        bla       bla
7    NULL        baz       baz
2    1           bar       foo/bar
5    1           blu       foo/blu
8    1           foo       foo/foo
3    2           baz       foo/bar/baz
6    2           blo       foo/bar/blo
9    2           bar       foo/bar/bar
like image 164
Serge Belov Avatar answered Nov 12 '22 10:11

Serge Belov


The trick is to do it the other way around, and build up the following query:

from entity in dataContext.Entities
where entity.Code == "baz"
where entity.Parent.Code == "bar"
where entity.Parent.Parent.Code == "foo"
where entity.Parent.Parent.ParentID == 0
select entity;

A bit naive (hard coded) solution would be like this:

var pathParts = path.Split('/').ToList();

var entities = 
    from entity in dataContext.Entities 
    select entity;

pathParts.Reverse();

for (int index = 0; index < pathParts.Count+ index++)
{
    string pathPart = pathParts[index];

    switch (index)
    {
        case 0:
            entities = entities.Where(
                entity.Code == pathPart);
            break;
        case 1:
            entities = entities.Where(
                entity.Parent.Code == pathPart);
            break;
        case 2:
            entities = entities.Where(entity.Parent.Parent.Code == pathPart);
            break;
        case 3:
            entities = entities.Where(
                entity.Parent.Parent.Parent.Code == pathPart);
            break;
        default:
            throw new NotSupportedException();
    }
}

Doing this dynamically by building expression trees isn't trivial, but can be done by looking closely at what the C# compiler generates (using ILDasm or Reflector for instance). Here is an example:

private static Entity GetEntityByPath(DataContext dataContext, string path)
{
    List<string> pathParts = path.Split(new char[] { '/' }).ToList<string>();
    pathParts.Reverse();

    var entities =
        from entity in dataContext.Entities
        select entity;

    // Build up a template expression that will be used to create the real expressions with.
    Expression<Func<Entity, bool>> templateExpression = entity => entity.Code == "dummy";
    var equals = (BinaryExpression)templateExpression.Body;
    var property = (MemberExpression)equals.Left;

    ParameterExpression entityParameter = Expression.Parameter(typeof(Entity), "entity");

    for (int index = 0; index < pathParts.Count; index++)
    {
        string pathPart = pathParts[index];

        var entityFilterExpression =
            Expression.Lambda<Func<Entity, bool>>(
                Expression.Equal(
                    Expression.Property(
                        BuildParentPropertiesExpression(index, entityParameter),
                        (MethodInfo)property.Member),
                    Expression.Constant(pathPart),
                    equals.IsLiftedToNull,
                    equals.Method),
                templateExpression.Parameters);

        entities = entities.Where<Entity>(entityFilterExpression);

        // TODO: The entity.Parent.Parent.ParentID == 0 part is missing here.
    }

    return entities.Single<Entity>();
}

private static Expression BuildParentPropertiesExpression(int numberOfParents, ParameterExpression entityParameter)
{
    if (numberOfParents == 0)
    {
        return entityParameter;
    }

    var getParentMethod = typeof(Entity).GetProperty("Parent").GetGetMethod();

    var property = Expression.Property(entityParameter, getParentMethod);

    for (int count = 2; count <= numberOfParents; count++)
    {
        property = Expression.Property(property, getParentMethod);
    }

    return property;
}
like image 23
Steven Avatar answered Nov 12 '22 11:11

Steven


You need a recursive function instead of your loop. Something like this should do the job:

public EntityTable Single(string path)
{
    List<string> pathParts = path.Split('/').ToList();
    string code = pathParts.Last();

    var entities = dataContext.EntityTables.Where(e => e.Code == code);

    pathParts.RemoveAt(pathParts.Count - 1);
    return GetRecursively(entities, pathParts);
}

private EntityTable GetRecursively(IQueryable<EntityTable> entity, List<string> pathParts)
{
    if (!(entity == null || pathParts.Count == 0))
    {
        string code = pathParts.Last();

        if (pathParts.Count == 1)
        {
            return entity.Where(x => x.EntityTable1.Code == code && x.ParentId == x.Id).FirstOrDefault();
        }
        else
        {                    
            pathParts.RemoveAt(pathParts.Count - 1);

            return this.GetRecursively(entity.Where(x => x.EntityTable1.Code == code), pathParts);
        }
    }
    else
    {
        return null;
    }
}

As you see, I am just returning the ultimate parent node. If you wanted to get a list of all EntityTable objects then I would make the recursive method to return a List of Ids of found nodes, and at the end - in the Single(...) method - run a simple LINQ query to get your IQueryable object using this list of IDs.

Edit: I tried to do your task but I think that there is a fundamental problem: there are cases when you are not able to identify a single path. For example, you have two pathes "foo/bar/baz" and "foo/bar/baz/bak" where "baz" entities are different. If you'll be seeking path "foo/bar/baz" then you'll always find two matching pathes (one would be partial of the four-entity path). Although you can get your "baz" entity correctly, but this is too confusing and I would just redesign this: either put a unique constraint so that each entity can only be used once, or store full path in the "Code" column.

like image 1
lekso Avatar answered Nov 12 '22 11:11

lekso