Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Query takes extremely long in client app but is fast in SQL Server Management Studio

I'm developing an application that stores images and related metadata. I'm running into issues when performing a certain query using NHibernate. The query is taking prohibitively long (on my machine something like 31 seconds), although the same query takes only a fraction of a second when executed in SQL Server Management Studio.

I've reduced and extraced the problem to a small test application:

Entities:

Tag, consisting of Id (string, the tag value itself)

public class Tag
{
    public virtual string Id { get; set; }
}

Image, consisting of Id (int), Name (string) and Tags (many-to-many, set of Tag instances)

public class Image
{
    private Iesi.Collections.Generic.ISet<Tag> tags = new HashedSet<Tag>();

    public virtual int Id { get; set; }

    public virtual string Name { get; set; }

    public virtual IEnumerable<Tag> Tags
    {
        get { return tags; }
    }

    public virtual void AddTag(Tag tag)
    {
        tags.Add(tag);
    }
}

I'm using "mapping by code" with the following mappings:

public class TagMapping : ClassMapping<Tag>
{
    public TagMapping()
    {
        Id(x => x.Id, map => map.Generator(Generators.Assigned));
    }
}

public class ImageMapping : ClassMapping<Image>
{
    public ImageMapping()
    {
        Id(x => x.Id, map => map.Generator(Generators.Native));
        Property(x => x.Name);
        Set(x => x.Tags, 
            map => map.Access(Accessor.Field),
            map => map.ManyToMany(m2m => { }));
    }
}

The NHibernate/database configuration looks like this:

  <hibernate-configuration xmlns="urn:nhibernate-configuration-2.2">
    <session-factory>
      <property name="dialect">NHibernate.Dialect.MsSql2008Dialect</property>
      <property name="connection.connection_string_name">PrimaryDatabase</property>
        <property name="format_sql">true</property>
    </session-factory>
  </hibernate-configuration>
  <connectionStrings>
    <add name="PrimaryDatabase" providerName="System.Data.SqlClient" connectionString="Data Source=.\SQLEXPRESS;Initial Catalog=PerfTest;Integrated Security=True" />
  </connectionStrings>

I want to achieve the following query: give me all images where the name contains a specific string or where any tag contains a specific string. To find the latter I use a subquery that gives me the Ids of all images with matching tags. So in the end the search criteria are: the image has a name containing a specific string or its ID is one of those returned by the subquery.

Here's the code that executes the query:

var term = "abc";
var mode = MatchMode.Anywhere;

var imagesWithMatchingTag = QueryOver.Of<Image>()
    .JoinQueryOver<Tag>(x => x.Tags)
    .WhereRestrictionOn(x => x.Id).IsLike(term, mode)
    .Select(x => x.Id);

var qry = session.QueryOver<Image>()
    .Where( Restrictions.On<Image>(x => x.Name).IsLike(term, mode) ||
            Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag))
    .List();

The test database (DBMS: SQL Server 2008 Express R2) I run this query against was created specifically for this test and does not contain anything else. I have filled it with random data: 10.000 images (table Image), 4.000 tags (table Tag) and roughly 200.000 associations between images and tags (table Tags), ie. each image has about 20 associated tags. The database

The SQL NHibernate claims to use is:

SELECT
    this_.Id as Id1_0_,
    this_.Name as Name1_0_
FROM
    Image this_
WHERE
    (
        this_.Name like @p0
        or this_.Id in (
            SELECT
                this_0_.Id as y0_
            FROM
                Image this_0_
            inner join
                Tags tags3_
                    on this_0_.Id=tags3_.image_key
            inner join
                Tag tag1_
                    on tags3_.elt=tag1_.Id
            WHERE
                tag1_.Id like @p1
        )
    );
@p0 = '%abc%' [Type: String (4000)], @p1 = '%abc%' [Type: String (4000)]

This looks reasonable given the query I'm creating.

If I run this query using NHibernate the query takes about 30+ seconds (NHibernate.AdoNet.AbstractBatcher - ExecuteReader took 32964 ms) and returns 98 entities.

However, if I execute an equivalent query directly inside Sql Server Management studio:

DECLARE @p0 nvarchar(4000)
DECLARE @p1 nvarchar(4000)

SET @p0 = '%abc%'
SET @p1 = '%abc%'    

SELECT
    this_.Id as Id1_0_,
    this_.Name as Name1_0_
FROM
    Image this_
WHERE
    (
        this_.Name like @p0
        or this_.Id in (
            SELECT
                this_0_.Id as y0_
            FROM
                Image this_0_
            inner join
                Tags tags3_
                    on this_0_.Id=tags3_.image_key
            inner join
                Tag tag1_
                    on tags3_.elt=tag1_.Id
            WHERE
                tag1_.Id like @p1
        )
    );

The query takes much less than one second (and returns 98 results as well).

Further experiments:

If I only search by name or only by tags, ie.:

var qry = session.QueryOver<Image>()
    .Where( Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag))
    .List();

or

var qry = session.QueryOver<Image>()
    .Where(Restrictions.On<Image>(x => x.Name).IsLike(term, mode))
    .List();

the queries are fast.

If I don't use like but an exact match in my subquery:

var imagesWithMatchingTag = QueryOver.Of<Image>()
    .JoinQueryOver<Tag>(x => x.Tags)
    .Where(x => x.Id == term)
    .Select(x => x.Id);

the query is fast, too.

Changing the match mode for the name to Exact doesn't change anything.

When I debug the program and pause while the query is executing the top of the managed call stack looks like:

[Managed to Native Transition]   
System.Data.dll!SNINativeMethodWrapper.SNIReadSync(System.Runtime.InteropServices.SafeHandle pConn, ref System.IntPtr packet, int timeout) + 0x53 bytes  
System.Data.dll!System.Data.SqlClient.TdsParserStateObject.ReadSni(System.Data.Common.DbAsyncResult asyncResult, System.Data.SqlClient.TdsParserStateObject stateObj) + 0xa3 bytes   
System.Data.dll!System.Data.SqlClient.TdsParserStateObject.ReadNetworkPacket() + 0x24 bytes  
System.Data.dll!System.Data.SqlClient.TdsParserStateObject.ReadBuffer() + 0x1f bytes     
System.Data.dll!System.Data.SqlClient.TdsParserStateObject.ReadByte() + 0x46 bytes   
System.Data.dll!System.Data.SqlClient.TdsParser.Run(System.Data.SqlClient.RunBehavior runBehavior, System.Data.SqlClient.SqlCommand cmdHandler, System.Data.SqlClient.SqlDataReader dataStream, System.Data.SqlClient.BulkCopySimpleResultSet bulkCopyHandler, System.Data.SqlClient.TdsParserStateObject stateObj) + 0x67 bytes     
System.Data.dll!System.Data.SqlClient.SqlDataReader.ConsumeMetaData() + 0x22 bytes   
System.Data.dll!System.Data.SqlClient.SqlDataReader.MetaData.get() + 0x57 bytes  
System.Data.dll!System.Data.SqlClient.SqlCommand.FinishExecuteReader(System.Data.SqlClient.SqlDataReader ds, System.Data.SqlClient.RunBehavior runBehavior, string resetOptionsString) + 0xe1 bytes  
...

So, my questions are:

  • Why does the query so much longer when performed by NHibernate even though the SQL used is the same?
  • How can I get rid of the difference? Is there a setting that can cause this behavior?

I know the query in general isn't the most efficient thing in the world, but what's striking me here is the difference between using NHibernate and manualy querying. There is definitively something strange going on here.

Sorry for the long post, but I wanted to include as much as possible about the issue. Thanks a lot in advance for your help!

Update 1: I've tested the application with NHProf without much added value: NHProf shows that the executed SQL is

SELECT this_.Id   as Id1_0_,
       this_.Name as Name1_0_
FROM   Image this_
WHERE  (this_.Name like '%abc%' /* @p0 */
         or this_.Id in (SELECT this_0_.Id as y0_
                         FROM   Image this_0_
                                inner join Tags tags3_
                                  on this_0_.Id = tags3_.image_key
                                inner join Tag tag1_
                                  on tags3_.elt = tag1_.Id
                         WHERE  tag1_.Id like '%abc%' /* @p1 */))

Which is exactly what I posted before (because that's what NHibernate wrote to its log in the first place).

Here's a screenshot of NHProf Screenshot of NHProf

The warnings are understandable but don't explain the behavior.

Update 2 @surfen sugested to pull the results of the sub query out of the DB first and stick them back into the main query:

var imagesWithMatchingTag = QueryOver.Of<Image>()
    .JoinQueryOver<Tag>(x => x.Tags)
    .WhereRestrictionOn(x => x.Id).IsLike(term, mode)
    .Select(x => x.Id);

var ids = imagesWithMatchingTag.GetExecutableQueryOver(session).List<int>().ToArray();

var qry = session.QueryOver<Image>()
    .Where(
            Restrictions.On<Image>(x => x.Name).IsLike(term, mode) ||
            Restrictions.On<Image>(x => x.Id).IsIn(ids))
    .List();

While this does indeed make the main query fast again, I'd rather not take this approach as it does not fit well with the intended usage in the real world application. It is interesting that this is so much faster, though. I'd expect the subquery approach to be equally fast given that it is not dependent on the outer query.

Update 3 This doesn't seem to be related to NHibernate. If I run the query using normal ADO.NET objects I get the same behavior:

var cmdText = @"SELECT this_.Id   as Id1_0_,
                        this_.Name as Name1_0_
                FROM   Image this_
                WHERE  (this_.Name like  @p0 
                            or this_.Id in 
                        (SELECT this_0_.Id as y0_
                        FROM   Image this_0_
                            inner join Tags tags3_
                                on this_0_.Id = tags3_.image_key
                            inner join Tag tag1_
                                on tags3_.elt = tag1_.Id
                        WHERE  tag1_.Id like  @p1 ));";

using (var con = new SqlConnection(ConfigurationManager.ConnectionStrings["PrimaryDatabase"].ConnectionString))
{
    con.Open();
    using (var txn = con.BeginTransaction())
    {
        using (var cmd = new SqlCommand(cmdText, con, txn))
        {
            cmd.CommandTimeout = 120;
            cmd.Parameters.AddWithValue("p0", "%abc%");
            cmd.Parameters.AddWithValue("p1", "%abc%");

            using (var reader = cmd.ExecuteReader())
            {
                while (reader.Read())
                {
                    Console.WriteLine("Match");
                }
            }

        }
        txn.Commit();
    }
}

Update 4

Query-plans (click to zoom):

Slow query Slow plan

Fast query Fast plan

There definitively is a difference in the plan.

Update 5

As it indeed seems that Sql Server treats the subquery as being correlated I tried something different: I moved the criterion related to the name to a subquery by itself:

var term = "abc";
var mode = MatchMode.Anywhere;

var imagesWithMatchingTag = QueryOver.Of<Image>()
    .JoinQueryOver<Tag>(x => x.Tags)
    .WhereRestrictionOn(x => x.Id).IsLike(term, mode)
    .Select(x => x.Id);

var imagesWithMatchingName = QueryOver.Of<Image>()
    .WhereRestrictionOn(x => x.Name).IsLike(term, mode)
    .Select(x => x.Id);

var qry = session.QueryOver<Image>()
    .Where(
      Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingName) ||        
      Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag) 
    ).List();

Generated SQL:

SELECT
    this_.Id as Id1_0_,
    this_.Name as Name1_0_
FROM
    Image this_
WHERE
    (
        this_.Id in (
            SELECT
                this_0_.Id as y0_
            FROM
                Image this_0_
            inner join
                Tags tags3_
                    on this_0_.Id=tags3_.image_key
            inner join
                Tag tag1_
                    on tags3_.elt=tag1_.Id
            WHERE
                tag1_.Id like @p0
        )
        or this_.Id in (
            SELECT
                this_0_.Id as y0_
            FROM
                Image this_0_
            WHERE
                this_0_.Name like @p1
        )
    );
@p0 = '%abc%' [Type: String (4000)], @p1 = '%abc%' [Type: String (4000)]

This seems to break the correlation and as a result the query becomes "fast" again ("fast" as in "acceptable for the moment"). The query time went down from 30+ seconds to ~170ms. Still not a lightweight query, but at least will allow me to continue from here. I know that a "like '%foo%'" will never be super fast. If it comes to the worst I can still move to a specialized search server (Lucene, solr) or real full text searching.

Update 6 I was able to rewrite the query to NOT use subqueries at all:

var qry = session.QueryOver(() => img)
    .Left.JoinQueryOver(x => x.Tags, () => tag)
    .Where(
        Restrictions.Like(Projections.Property(() => img.Name), term, mode) ||
        Restrictions.Like(Projections.Property(() => tag.Id), term, mode))
    .TransformUsing(Transformers.DistinctRootEntity)
    .List();

SQL:

SELECT
    this_.Id as Id1_1_,
    this_.Name as Name1_1_,
    tags3_.image_key as image1_3_,
    tag1_.Id as elt3_,
    tag1_.Id as Id0_0_
FROM
    Image this_
left outer join
    Tags tags3_
        on this_.Id=tags3_.image_key
left outer join
    Tag tag1_
        on tags3_.elt=tag1_.Id
WHERE
    (
        this_.Name like @p0
        or tag1_.Id like @p1
    );
@p0 = '%abc%' [Type: String (4000)], @p1 = '%abc%' [Type: String (4000)]

However, the query performs now slightly worse than the version with subqueries. I'll investigate this further.

like image 222
Andre Loker Avatar asked Mar 30 '12 13:03

Andre Loker


1 Answers

My bet is that it's the second query that is slow:

var qry = session.QueryOver<Image>()
.Where( Restrictions.On<Image>(x => x.Name).IsLike(term, mode) ||
        Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag))
.List();

You provided SQL for only the first query. What about the second? Did you test it under SQL Management Studio? Use SQL Server Profiler as @JoachimIsaksson suggests to find out what queries exacly NHibernate executes server-side.

This looks like you're loading 97 image objects into memory. How large is each of them?

EDIT

Another bet is that your first query executes ad inner query for the second query. Try doing .List() on the first query to load the tags into memory.

EDIT 2

From the query plans it really looks like your query is being called as a Correlated subquery. You mentioned that these queries are fast:

var qry = session.QueryOver<Image>()
.Where( Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag))
.List();

or

var qry = session.QueryOver<Image>()
.Where(Restrictions.On<Image>(x => x.Name).IsLike(term, mode))
.List();

Just UNION them and you should get the same result as running both of them separately. Also make sure that all the join columns have indexes.

That's the catch with IS IN (query) - you can't be sure how the database executes it (unless you somehow force it to use a certain plan). Maybe you could change .In() it into JoinQueryOver() somehow?

like image 195
surfen Avatar answered Nov 15 '22 05:11

surfen