Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# Driver SafeMode off Upserts - not all records updated/inserted

Tags:

c#

mongodb

In our application we are doing large amounts of inserts/updates (anywhere from 1k to 100k) and I noticed that not all records are being saved. It saves between 90%-95% of records with safemode off.

Doing the upsert with safemode on upserts all records successfully but is much too slow. I remember reading somewhere that even with safemode off there should be no reason an update/insert should fail unless the server is unavailable.

I wrote a small app to test this, and have included the code below. It tries to insert 100,000 ints into Mongo, and when checked after it is run I see about 90,000 records in the collection.

(Note: I am using Parallel updating, since I am updating by _id and Mongo 2.0 supports parallel operations when using _id. When not using Parallel.Foreach I still see some loss of records though not as great)

        MongoServer server = MongoServer.Create(host);

        MongoDatabase test = server.GetDatabase("testDB");

        var list = Enumerable.Range(0, 100000).ToList();

        using (server.RequestStart(test))
        {
            MongoCollection coll = test.GetCollection("testCollection");

            Parallel.ForEach(list, i =>
            {
                var query = new QueryDocument("_id", i);
                coll.Update(query, Update.Set("value",100), 
                             UpdateFlags.Upsert, SafeMode.False);;
            });
        }

So I guess my question is: What is the best way to do large numbers of updates fast, with 100% success rate?

I can't use insert because I have a number of processes writing to Mongo and cannot be sure of whether a certain document exists or not , which is why I am using Upsert.

like image 806
jeffsaracco Avatar asked Jan 18 '23 12:01

jeffsaracco


2 Answers

When you use SafeMode.False the C# driver just writes the Insert/Update messages to the socket and doesn't wait for a reply. When you write a lot of data very quickly to a socket it's going to get buffered on the client side, and the networking stack will squirt the bytes out on the network as fast as it can. If you are saturating the network, things can get backed up quite a bit.

My guess is that you are exiting your process before the networking stack has had a chance to write all remaining bytes out to the network. That would explain the lost documents.

Your best best is to call Count at the end, not once, but repeatedly, until the count equals what you think it should be. At that point you know there is no data left to be transmitted.

However, if any of the inserts failed for any reason (for example, violating a unique index), the count will never reach your expected value. There is no 100% way of knowing whether an Insert/Update worked without using SafeMode.True.

Note that most long lived server processes never have this problem because they never exit.

like image 163
Robert Stam Avatar answered Jan 30 '23 08:01

Robert Stam


I found your question very interesting so I did some testing on my own.

It seems that making a call to coll.Count() periodically did the trick in my tests.

You will need to test further for performance, but I think it is still better than doing a SafeMode.True

Here is the code of the test unit to prove the fix:

    [TestMethod]
    public void TestMethod1()
    {
        MongoServer server = MongoServer.Create(ConfigurationManager.ConnectionStrings["MongoUnitTestConnStr"].ConnectionString);

        MongoDatabase test = server.GetDatabase("unit_test_db");

        int totalDocuments = 100000;
        var list = Enumerable.Range(0, totalDocuments).ToList();

        int count = 0;
        DateTime start, end;

        using (server.RequestStart(test))
        {
            MongoCollection coll = test.GetCollection("testCollection");

            start = DateTime.Now;
            Parallel.ForEach(list, i =>
            {

                var query = new QueryDocument("_id", i);
                coll.Update(query, Update.Set("value", 100),
                             UpdateFlags.Upsert, SafeMode.False);

                // Calling a count periodically (but sparsely) seems to do the trick.
                if (i%10000 == 0)
                    count = coll.Count();

            });

            // Call count one last time to report in the test results.
            count = coll.Count();

            end = DateTime.Now;
        }

        Console.WriteLine(String.Format("Execution Time:{0}.  Expected No of docs: {2}, Actual No of docs {3}", (end-start).TotalSeconds, count, totalDocuments));
    }

Results of the test:

Execution Time:105.8125.812. Expected No of docs: 100000, Actual No of docs 100000

like image 23
agarcian Avatar answered Jan 30 '23 07:01

agarcian