I'm developing an application that stores images and related metadata. I'm running into issues when performing a certain query using NHibernate. The query is taking prohibitively long (on my machine something like 31 seconds), although the same query takes only a fraction of a second when executed in SQL Server Management Studio.
I've reduced and extraced the problem to a small test application:
Entities:
Tag, consisting of Id (string, the tag value itself)
public class Tag
{
public virtual string Id { get; set; }
}
Image, consisting of Id (int), Name (string) and Tags (many-to-many, set of Tag instances)
public class Image
{
private Iesi.Collections.Generic.ISet<Tag> tags = new HashedSet<Tag>();
public virtual int Id { get; set; }
public virtual string Name { get; set; }
public virtual IEnumerable<Tag> Tags
{
get { return tags; }
}
public virtual void AddTag(Tag tag)
{
tags.Add(tag);
}
}
I'm using "mapping by code" with the following mappings:
public class TagMapping : ClassMapping<Tag>
{
public TagMapping()
{
Id(x => x.Id, map => map.Generator(Generators.Assigned));
}
}
public class ImageMapping : ClassMapping<Image>
{
public ImageMapping()
{
Id(x => x.Id, map => map.Generator(Generators.Native));
Property(x => x.Name);
Set(x => x.Tags,
map => map.Access(Accessor.Field),
map => map.ManyToMany(m2m => { }));
}
}
The NHibernate/database configuration looks like this:
<hibernate-configuration xmlns="urn:nhibernate-configuration-2.2">
<session-factory>
<property name="dialect">NHibernate.Dialect.MsSql2008Dialect</property>
<property name="connection.connection_string_name">PrimaryDatabase</property>
<property name="format_sql">true</property>
</session-factory>
</hibernate-configuration>
<connectionStrings>
<add name="PrimaryDatabase" providerName="System.Data.SqlClient" connectionString="Data Source=.\SQLEXPRESS;Initial Catalog=PerfTest;Integrated Security=True" />
</connectionStrings>
I want to achieve the following query: give me all images where the name contains a specific string or where any tag contains a specific string. To find the latter I use a subquery that gives me the Ids of all images with matching tags. So in the end the search criteria are: the image has a name containing a specific string or its ID is one of those returned by the subquery.
Here's the code that executes the query:
var term = "abc";
var mode = MatchMode.Anywhere;
var imagesWithMatchingTag = QueryOver.Of<Image>()
.JoinQueryOver<Tag>(x => x.Tags)
.WhereRestrictionOn(x => x.Id).IsLike(term, mode)
.Select(x => x.Id);
var qry = session.QueryOver<Image>()
.Where( Restrictions.On<Image>(x => x.Name).IsLike(term, mode) ||
Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag))
.List();
The test database (DBMS: SQL Server 2008 Express R2) I run this query against was created specifically for this test and does not contain anything else. I have filled it with random data: 10.000 images (table Image), 4.000 tags (table Tag) and roughly 200.000 associations between images and tags (table Tags), ie. each image has about 20 associated tags. The database
The SQL NHibernate claims to use is:
SELECT
this_.Id as Id1_0_,
this_.Name as Name1_0_
FROM
Image this_
WHERE
(
this_.Name like @p0
or this_.Id in (
SELECT
this_0_.Id as y0_
FROM
Image this_0_
inner join
Tags tags3_
on this_0_.Id=tags3_.image_key
inner join
Tag tag1_
on tags3_.elt=tag1_.Id
WHERE
tag1_.Id like @p1
)
);
@p0 = '%abc%' [Type: String (4000)], @p1 = '%abc%' [Type: String (4000)]
This looks reasonable given the query I'm creating.
If I run this query using NHibernate the query takes about 30+ seconds (NHibernate.AdoNet.AbstractBatcher - ExecuteReader took 32964 ms
) and returns 98 entities.
However, if I execute an equivalent query directly inside Sql Server Management studio:
DECLARE @p0 nvarchar(4000)
DECLARE @p1 nvarchar(4000)
SET @p0 = '%abc%'
SET @p1 = '%abc%'
SELECT
this_.Id as Id1_0_,
this_.Name as Name1_0_
FROM
Image this_
WHERE
(
this_.Name like @p0
or this_.Id in (
SELECT
this_0_.Id as y0_
FROM
Image this_0_
inner join
Tags tags3_
on this_0_.Id=tags3_.image_key
inner join
Tag tag1_
on tags3_.elt=tag1_.Id
WHERE
tag1_.Id like @p1
)
);
The query takes much less than one second (and returns 98 results as well).
Further experiments:
If I only search by name or only by tags, ie.:
var qry = session.QueryOver<Image>()
.Where( Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag))
.List();
or
var qry = session.QueryOver<Image>()
.Where(Restrictions.On<Image>(x => x.Name).IsLike(term, mode))
.List();
the queries are fast.
If I don't use like but an exact match in my subquery:
var imagesWithMatchingTag = QueryOver.Of<Image>()
.JoinQueryOver<Tag>(x => x.Tags)
.Where(x => x.Id == term)
.Select(x => x.Id);
the query is fast, too.
Changing the match mode for the name to Exact doesn't change anything.
When I debug the program and pause while the query is executing the top of the managed call stack looks like:
[Managed to Native Transition]
System.Data.dll!SNINativeMethodWrapper.SNIReadSync(System.Runtime.InteropServices.SafeHandle pConn, ref System.IntPtr packet, int timeout) + 0x53 bytes
System.Data.dll!System.Data.SqlClient.TdsParserStateObject.ReadSni(System.Data.Common.DbAsyncResult asyncResult, System.Data.SqlClient.TdsParserStateObject stateObj) + 0xa3 bytes
System.Data.dll!System.Data.SqlClient.TdsParserStateObject.ReadNetworkPacket() + 0x24 bytes
System.Data.dll!System.Data.SqlClient.TdsParserStateObject.ReadBuffer() + 0x1f bytes
System.Data.dll!System.Data.SqlClient.TdsParserStateObject.ReadByte() + 0x46 bytes
System.Data.dll!System.Data.SqlClient.TdsParser.Run(System.Data.SqlClient.RunBehavior runBehavior, System.Data.SqlClient.SqlCommand cmdHandler, System.Data.SqlClient.SqlDataReader dataStream, System.Data.SqlClient.BulkCopySimpleResultSet bulkCopyHandler, System.Data.SqlClient.TdsParserStateObject stateObj) + 0x67 bytes
System.Data.dll!System.Data.SqlClient.SqlDataReader.ConsumeMetaData() + 0x22 bytes
System.Data.dll!System.Data.SqlClient.SqlDataReader.MetaData.get() + 0x57 bytes
System.Data.dll!System.Data.SqlClient.SqlCommand.FinishExecuteReader(System.Data.SqlClient.SqlDataReader ds, System.Data.SqlClient.RunBehavior runBehavior, string resetOptionsString) + 0xe1 bytes
...
So, my questions are:
I know the query in general isn't the most efficient thing in the world, but what's striking me here is the difference between using NHibernate and manualy querying. There is definitively something strange going on here.
Sorry for the long post, but I wanted to include as much as possible about the issue. Thanks a lot in advance for your help!
Update 1: I've tested the application with NHProf without much added value: NHProf shows that the executed SQL is
SELECT this_.Id as Id1_0_,
this_.Name as Name1_0_
FROM Image this_
WHERE (this_.Name like '%abc%' /* @p0 */
or this_.Id in (SELECT this_0_.Id as y0_
FROM Image this_0_
inner join Tags tags3_
on this_0_.Id = tags3_.image_key
inner join Tag tag1_
on tags3_.elt = tag1_.Id
WHERE tag1_.Id like '%abc%' /* @p1 */))
Which is exactly what I posted before (because that's what NHibernate wrote to its log in the first place).
Here's a screenshot of NHProf
The warnings are understandable but don't explain the behavior.
Update 2 @surfen sugested to pull the results of the sub query out of the DB first and stick them back into the main query:
var imagesWithMatchingTag = QueryOver.Of<Image>()
.JoinQueryOver<Tag>(x => x.Tags)
.WhereRestrictionOn(x => x.Id).IsLike(term, mode)
.Select(x => x.Id);
var ids = imagesWithMatchingTag.GetExecutableQueryOver(session).List<int>().ToArray();
var qry = session.QueryOver<Image>()
.Where(
Restrictions.On<Image>(x => x.Name).IsLike(term, mode) ||
Restrictions.On<Image>(x => x.Id).IsIn(ids))
.List();
While this does indeed make the main query fast again, I'd rather not take this approach as it does not fit well with the intended usage in the real world application. It is interesting that this is so much faster, though. I'd expect the subquery approach to be equally fast given that it is not dependent on the outer query.
Update 3 This doesn't seem to be related to NHibernate. If I run the query using normal ADO.NET objects I get the same behavior:
var cmdText = @"SELECT this_.Id as Id1_0_,
this_.Name as Name1_0_
FROM Image this_
WHERE (this_.Name like @p0
or this_.Id in
(SELECT this_0_.Id as y0_
FROM Image this_0_
inner join Tags tags3_
on this_0_.Id = tags3_.image_key
inner join Tag tag1_
on tags3_.elt = tag1_.Id
WHERE tag1_.Id like @p1 ));";
using (var con = new SqlConnection(ConfigurationManager.ConnectionStrings["PrimaryDatabase"].ConnectionString))
{
con.Open();
using (var txn = con.BeginTransaction())
{
using (var cmd = new SqlCommand(cmdText, con, txn))
{
cmd.CommandTimeout = 120;
cmd.Parameters.AddWithValue("p0", "%abc%");
cmd.Parameters.AddWithValue("p1", "%abc%");
using (var reader = cmd.ExecuteReader())
{
while (reader.Read())
{
Console.WriteLine("Match");
}
}
}
txn.Commit();
}
}
Update 4
Query-plans (click to zoom):
Slow query
Fast query
There definitively is a difference in the plan.
Update 5
As it indeed seems that Sql Server treats the subquery as being correlated I tried something different: I moved the criterion related to the name to a subquery by itself:
var term = "abc";
var mode = MatchMode.Anywhere;
var imagesWithMatchingTag = QueryOver.Of<Image>()
.JoinQueryOver<Tag>(x => x.Tags)
.WhereRestrictionOn(x => x.Id).IsLike(term, mode)
.Select(x => x.Id);
var imagesWithMatchingName = QueryOver.Of<Image>()
.WhereRestrictionOn(x => x.Name).IsLike(term, mode)
.Select(x => x.Id);
var qry = session.QueryOver<Image>()
.Where(
Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingName) ||
Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag)
).List();
Generated SQL:
SELECT
this_.Id as Id1_0_,
this_.Name as Name1_0_
FROM
Image this_
WHERE
(
this_.Id in (
SELECT
this_0_.Id as y0_
FROM
Image this_0_
inner join
Tags tags3_
on this_0_.Id=tags3_.image_key
inner join
Tag tag1_
on tags3_.elt=tag1_.Id
WHERE
tag1_.Id like @p0
)
or this_.Id in (
SELECT
this_0_.Id as y0_
FROM
Image this_0_
WHERE
this_0_.Name like @p1
)
);
@p0 = '%abc%' [Type: String (4000)], @p1 = '%abc%' [Type: String (4000)]
This seems to break the correlation and as a result the query becomes "fast" again ("fast" as in "acceptable for the moment"). The query time went down from 30+ seconds to ~170ms. Still not a lightweight query, but at least will allow me to continue from here. I know that a "like '%foo%'"
will never be super fast. If it comes to the worst I can still move to a specialized search server (Lucene, solr) or real full text searching.
Update 6 I was able to rewrite the query to NOT use subqueries at all:
var qry = session.QueryOver(() => img)
.Left.JoinQueryOver(x => x.Tags, () => tag)
.Where(
Restrictions.Like(Projections.Property(() => img.Name), term, mode) ||
Restrictions.Like(Projections.Property(() => tag.Id), term, mode))
.TransformUsing(Transformers.DistinctRootEntity)
.List();
SQL:
SELECT
this_.Id as Id1_1_,
this_.Name as Name1_1_,
tags3_.image_key as image1_3_,
tag1_.Id as elt3_,
tag1_.Id as Id0_0_
FROM
Image this_
left outer join
Tags tags3_
on this_.Id=tags3_.image_key
left outer join
Tag tag1_
on tags3_.elt=tag1_.Id
WHERE
(
this_.Name like @p0
or tag1_.Id like @p1
);
@p0 = '%abc%' [Type: String (4000)], @p1 = '%abc%' [Type: String (4000)]
However, the query performs now slightly worse than the version with subqueries. I'll investigate this further.
My bet is that it's the second query that is slow:
var qry = session.QueryOver<Image>()
.Where( Restrictions.On<Image>(x => x.Name).IsLike(term, mode) ||
Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag))
.List();
You provided SQL for only the first query. What about the second? Did you test it under SQL Management Studio? Use SQL Server Profiler as @JoachimIsaksson suggests to find out what queries exacly NHibernate executes server-side.
This looks like you're loading 97 image
objects into memory. How large is each of them?
EDIT
Another bet is that your first query executes ad inner query for the second query. Try doing .List() on the first query to load the tags into memory.
EDIT 2
From the query plans it really looks like your query is being called as a Correlated subquery. You mentioned that these queries are fast:
var qry = session.QueryOver<Image>()
.Where( Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag))
.List();
or
var qry = session.QueryOver<Image>()
.Where(Restrictions.On<Image>(x => x.Name).IsLike(term, mode))
.List();
Just UNION them and you should get the same result as running both of them separately. Also make sure that all the join columns have indexes.
That's the catch with IS IN (query) - you can't be sure how the database executes it (unless you somehow force it to use a certain plan). Maybe you could change .In() it into JoinQueryOver() somehow?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With