Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to get different results for the same SQL query if the data stays the same?

I get a different result set for this query intermittently when I run it...sometimes it gives 1363, sometimes 1365 and sometimes 1366 results. The data doesn't change. What could be causing this and is there a way to prevent it? Query looks something like this:

SELECT * 
FROM 
    (
            SELECT  
                        RC.UserGroupId,
                        RC.UserGroup,
                        RC.ClientId AS CLID,                     
                        CASE WHEN T1.MultipleClients = 1 THEN RC.Salutation1 ELSE RC.DisplayName1 END AS szDisplayName,
                        T1.MultipleClients,
                        RC.IsPrimaryRecord,
                        RC.RecordTypeId,
                        RC.ClientTypeId,
                        RC.ClientType,
                        RC.IsDeleted,
                        RC.IsCompany,                                            
                        RC.KnownAs,
                        RC.Salutation1,
                        RC.FirstName, 
                        RC.Surname,                   
                        Relationship, 
                        C.DisplayName Client,
                        RC.DisplayName RelatedClient, 
                        E.Email,                                                            
                        RC.DisplayName  + ' is the ' + R.Relationship + ' of ' + C.DisplayName Description,
                        ROW_NUMBER() OVER (PARTITION BY E.Email ORDER BY Relationship DESC) AS sequence_id


            FROM 
                        SSDS.Client.ClientExtended C 
                                                                              INNER JOIN 
                        SSDS.Client.ClientRelationship R WITH (NOLOCK)ON C.ClientId = R.ClientID 
                                                                              INNER JOIN 
                        SSDS.Client.ClientExtended RC WITH (NOLOCK)ON R.RelatedClientId = RC.ClientId
                                                                              LEFT OUTER JOIN
                        SSDS.Client.Email E WITH (NOLOCK)ON RC.ClientId = E.ClientId                 
                                                                              LEFT OUTER JOIN 
                        SSDS.Client.UserDefinedData UD WITH (NOLOCK)ON C.ClientId = UD.ClientId AND C.UserGroupId = UD.UserGroupId 
                                                                              INNER JOIN                                                                 

                        (
                              SELECT 
                                          E.Email, 
                                          CASE WHEN (COUNT(DISTINCT RC.DisplayName) > 1) THEN 1 ELSE 0 END AS MultipleClients 

                              FROM
                                          SSDS.Client.ClientExtended C 
                                                                                                INNER JOIN 
                                          SSDS.Client.ClientRelationship R WITH (NOLOCK)ON C.ClientId = R.ClientID 
                                                                                                INNER JOIN 
                                          SSDS.Client.ClientExtended RC WITH (NOLOCK)ON R.RelatedClientId = RC.ClientId
                                                                                                LEFT OUTER JOIN
                                          SSDS.Client.Email E WITH (NOLOCK)ON RC.ClientId = E.ClientId                 
                                                                                                LEFT OUTER JOIN 
                                          SSDS.Client.UserDefinedData UD WITH (NOLOCK)ON C.ClientId = UD.ClientId AND C.UserGroupId = UD.UserGroupId

                              WHERE 
                                          Relationship IN ('z-Group Principle', 'z-Group Member ')           
                                          AND E.Email IS NOT NULL

                              GROUP BY E.Email 

                        ) T1 ON E.Email = T1.Email


            WHERE 

                                                Relationship IN ('z-Group Principle', 'z-Group Member ')           
                                                AND E.Email IS NOT NULL                                         
    ) T


WHERE       
            sequence_id = 1
            AND T.UserGroupId IN (Select * from iCentral.dbo.GetSubUserGroups('471b9cbd-2312-4a8a-bb20-35ea53d30340',0))         
            AND T.IsDeleted = 0           
            AND T.RecordTypeId = 1 
            AND T.ClientTypeId IN
            (
                        '1',              --Client
                        '-1652203805'    --NTU                      
            )        

        AND T.CLID NOT IN
          (
            SELECT DISTINCT  
                               UDDF.CLID
            FROM
                   SLacsis_SLM.dbo.T_UserDef UD WITH (NOLOCK) 
                          INNER JOIN
                   SLacsis_SLM.dbo.T_UserDefData UDDF WITH (NOLOCK)
                   ON UD.UserDef_ID = UDDF.UserDef_ID
                          INNER JOIN
                   SLacsis_SLM.dbo.T_Client CLL WITH (NOLOCK)
                   ON CLL.CLID = UDDF.CLID AND CLL.UserGroup_CLID = UD.UserID

            WHERE 
                           UD.UserDef_ID in    
                                    (
                                    'F68F31CE-525B-4455-9D50-6DA77C66FEE5',                                    
                                    'A7CECB03-866C-4F1F-9E1A-CEB09474FE47'  
                                    )

                           AND UDDF.Data = 'NO'
           ) 


ORDER BY T.Surname

EDIT:

I have removed all NOLOCK's (including the ones in views and UDFs) and I'm still having the same issue. I get the same results every time for the nested select (T) and if I put the result set of T into a temp table in the beginning of the query and join onto the temp table instead of the nested select then the final result set is the same every time I run the query.

EDIT2:

I have been doing some more reading on ROW_NUMBER()...I'm partitioning by email (of which there are duplicates) and ordering by Relationship (where there are only 1 of 2 relationships). Could this cause the query to be non-deterministic and would there be a way to fix that?

EDIT3:

Here are the actual execution plans if anyone is interested http://www.mediafire.com/?qo5gkh5dftxf0ml. Is it possible to see that it is running as read committed from the execution plan? I've compared the files using WinMerge and the only differences seem to be the counts (ActualRows="").

EDIT4:

This works:

SELECT * FROM 
(
   SELECT *, 
             ROW_NUMBER() OVER (PARTITION BY B.Email ORDER BY Relationship DESC) AS sequence_id
             FROM
            (

                                    SELECT DISTINCT  
                                    RC.UserGroupId,
                                    ...
            ) B...

EDIT5:

When running the same ROW_NUMBER() query (T in the original question, just selecting RC.DisplayName and ROW_NUMBER) twice in a row I get different rank for some people:

enter image description here

Does anyone have a good explanation/example of why or how ROW_NUMBER() over a result set that contains duplicates can rank differently each time it is run and ultimately change the number of results?

EDIT6:

Ok I think this makes sense to me now. This occurs when people 2 people have the same email address (e.g a husband and wife pair) and relationship. I guess in this case their ROW_NUMBER() ranking is arbitrary and can change every time it is run.

like image 397
woggles Avatar asked Nov 30 '22 04:11

woggles


2 Answers

Your use of NOLOCK all over means you are doing dirty reads and will see uncommitted data, data that will be rolled back, transient and inconsistent data etc

Take these off, try again, report back pleas

Edit: some options with NOLOCKS removed

  1. Data is really changing
  2. Some parameter or filter is changing (eg GETDATE)
  3. Some float comparisons running on different cores each time
    See this on dba.se https://dba.stackexchange.com/q/4810/630
  4. Embedded NOLOCKs in udfs or views (eg iCentral.dbo.GetSubUserGroups)
  5. ...
like image 173
gbn Avatar answered Dec 04 '22 08:12

gbn


As I said yesterday in the comments the row numbering for rows with duplicate E.Email, Relationship values will be arbitrary.

To make it deterministic you would need to do PARTITION BY B.Email ORDER BY Relationship DESC, SomeUniqueColumn . Interesting that it changes between runs though using the same execution plan. I assume this is a consequence of the hash join.

like image 28
Martin Smith Avatar answered Dec 04 '22 08:12

Martin Smith