Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Guid.NewGuid() is returning duplicate values when used in a Parallel.For loop

I have an application that is hitting an API. As such, it does a query for all ID's in the object, and then has to query each item one at a time per ID. I'm doing this in a Parallel.For loop and adding each items data to a row in a datatable. Then I use sqlbulkcopy to send the datatable to a SQL server table.

If I do this without using Parallel.For, it works great. However, with Parallel.For, this line:

workrow["id"] = Guid.NewGuid();

is generating duplicate Guids. It's doing it often and causing the data to not load into the SQL server table because the id row in SQL is a Primary Key and doesn't allow duplicates. I tried locking:

                    lock (lockobject)
                    {
                        workrow["id"] = Guid.NewGuid();
                    }

This didn't help.
I tried not assigning an id to that field in hopes that SQL would generate it (it does have newid() on that field). That fails saying it can't insert a null. I can't seem to just remove the id field from the datatable because then the columns don't align when I do the sqlbulkcopy.

Can someone help me here? I either need to figure out how to get Guid.NewGuid() to STOP producing duplicates OR I need to figure out a way to not pass in the id (always the first field in the datatable) so that SQL will generate the id.

Here is the code I use to generate one of tables:

        public static DataTable MakeWorkflowTable()
        {
            DataTable Workflow = new DataTable("Workflow");
            DataColumn id = new DataColumn("id", System.Type.GetType("System.Guid"));
            Workflow.Columns.Add(id);
            DataColumn OrgInfoID = new DataColumn("OrgInfoID", System.Type.GetType("System.Guid"));
            Workflow.Columns.Add(OrgInfoID);
            DataColumn Name = new DataColumn("Name", System.Type.GetType("System.String"));
            Workflow.Columns.Add(Name);
            DataColumn Active = new DataColumn("Active", System.Type.GetType("System.String"));
            Workflow.Columns.Add(Active);
            DataColumn Description = new DataColumn("Description", System.Type.GetType("System.String"));
            Workflow.Columns.Add(Description);
            DataColumn Object = new DataColumn("Object", System.Type.GetType("System.String"));
            Workflow.Columns.Add(Object);
            DataColumn Formula = new DataColumn("Formula", System.Type.GetType("System.String"));
            Workflow.Columns.Add(Formula);
            DataColumn ManageableState = new DataColumn("ManageableState", System.Type.GetType("System.String"));
            Workflow.Columns.Add(ManageableState);
            DataColumn NameSpacePrefix = new DataColumn("NameSpacePrefix", System.Type.GetType("System.String"));
            Workflow.Columns.Add(NameSpacePrefix);
            DataColumn TDACount = new DataColumn("TDACount", System.Type.GetType("System.Int32"));
            Workflow.Columns.Add(TDACount);
            DataColumn TriggerType = new DataColumn("TriggerType", System.Type.GetType("System.String"));
            Workflow.Columns.Add(TriggerType);
            DataColumn CreatedDate = new DataColumn("CreatedDate", System.Type.GetType("System.DateTime"));
            Workflow.Columns.Add(CreatedDate);
            DataColumn CreatedBy = new DataColumn("CreatedBy", System.Type.GetType("System.String"));
            Workflow.Columns.Add(CreatedBy);
            DataColumn LastModifiedDate = new DataColumn("LastModifiedDate", System.Type.GetType("System.DateTime"));
            Workflow.Columns.Add(LastModifiedDate);
            DataColumn LastModifiedBy = new DataColumn("LastModifiedBy", System.Type.GetType("System.String"));
            Workflow.Columns.Add(LastModifiedBy);
            return Workflow;
        }

Here is the code I use to send it to the SQL server:

        public static void SendDTtoDB(ref DataTable dt, ref SqlConnection cnn, string TableName)
        {
            using (SqlBulkCopy bulkCopy = new SqlBulkCopy(cnn))
            {
                bulkCopy.DestinationTableName =
                    TableName;
                try
                {
                    bulkCopy.WriteToServer(dt);
                    dt.Clear();
                }
                catch (Exception e)
                {
                    logger.Warn("SendDTtoDB {TableName}: ORGID: {ORGID} : {Message}", TableName, dt.Rows[0]["OrgInfoID"], e.Message.ToString());
                    if (e.Message.ToString().Contains("PRIMARY KEY"))
                    {
                        foreach(DataRow row in dt.Rows)
                        {
                            logger.Warn("ID: {id}", row["id"]);
                        }
                    }
                }
            }

        }

As you can see in the catch statement, I set it to write out the ID's to the log so I could see them for myself and, sure enough, there is a duplicate there. So frustrating! I really don't want to take out the Parallel.For and single thread it if I don't have to.

Per request, here is the code with the Parallel.For

              if (qr.totalSize > 0)
                {
                    object lockobject = new object();
                    Parallel.For(0, qr.records.Length, i =>
                    {
                        ToolingService.CustomTab1 vr = new ToolingService.CustomTab1();

                        vr = (ToolingService.CustomTab1)qr.records[i];
                        string mdSOQL = "Select FullName, description, ManageableState, MasterLabel, NamespacePrefix, Type, Url, CreatedDate, CreatedBy.Name, "
                            + "LastModifiedDate, LastModifiedBy.Name From CustomTab where id='" + vr.Id + "'";
                        ToolingService.QueryResult mdqr = new ToolingService.QueryResult();
                        ToolingService.CustomTab1 vrmd = new ToolingService.CustomTab1();
                        mdqr = ts.query(mdSOQL);
                        vrmd = (ToolingService.CustomTab1)mdqr.records[0];

                        DataRow workrow = CustomTabs.NewRow();
                        lock (lockobject)
                        {
                            workrow["id"] = Guid.NewGuid();
                        }
                        workrow["OrgInfoID"] = _orgDBID;
                        workrow["FullName"] = vrmd.FullName;
                        workrow["Description"] = vrmd.Description ?? Convert.DBNull;
                        workrow["ManageableState"] = vrmd.ManageableState;
                        workrow["MasterLabel"] = vrmd.MasterLabel ?? Convert.DBNull;
                        workrow["NameSpacePrefix"] = vrmd.NamespacePrefix ?? Convert.DBNull;
                        workrow["Type"] = vrmd.Type ?? Convert.DBNull;
                        workrow["URL"] = vrmd.Url ?? Convert.DBNull;
                        workrow["CreatedDate"] = vrmd.CreatedDate ?? Convert.DBNull;
                        if (vrmd.CreatedBy == null)
                        {
                            workrow["CreatedBy"] = Convert.DBNull;
                        }
                        else
                        {
                            workrow["CreatedBy"] = vrmd.CreatedBy.Name;
                        }
                        workrow["LastModifiedDate"] = vrmd.LastModifiedDate ?? Convert.DBNull;
                        if (vrmd.LastModifiedBy == null)
                        {
                            workrow["LastModifiedBy"] = Convert.DBNull;
                        }
                        else
                        {
                            workrow["LastModifiedBy"] = vrmd.LastModifiedBy.Name;
                        }
                        lock (CustomTabs)
                        {
                            CustomTabs.Rows.Add(workrow);
                        }

                    });
                    OrgTables.SendDTtoDB(ref CustomTabs, ref _cnn, "OrgCustomTabs");
like image 407
Mike Jones Avatar asked Dec 08 '22 10:12

Mike Jones


1 Answers

I have seen this issue before. There is no problem with the Guid.NewGuid(), but the DataTable isn't threadsafe!

DataTable is simply not designed or intended for concurrent usage (in particular where there is any form of mutation involved).

See Thread safety for DataTable

Also related: c# DataGridView DataTable internal index corrupted in parallel loop

It's done in a loop because I have to query the API over and over to get each item in the object. I'm doing the Parallel.For loop so I can speed up the process. If I need to get 450 items or more one at a time, I'd like to multithread that for speed purposes. The database access isn't being done in the loop, just building the datatable is because once I get the data back from the API I need to store it.

You could create a type and add them multithreaded to a ConcurrentBag<T> or ConcurrentQueue<T> (or another concurrent collection, see MS Docs) - those are thread-safe :)

Then after it you could build the DataTable with a single thread. Or maybe skip the whole DataTable if possible for your use case.

like image 143
Julian Avatar answered May 10 '23 16:05

Julian