Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongoid Batch Update/Upsert Alternative?

I know that Mongoid v3+ supports batch inserts via Model.collection.insert(). However, I don't think it supports batch updates where the attributes differ for each record (so I don't think update_all would work either). Is there a way to do batch update/upsert instead of a single-record lookup & update?

Here's a simplified example in which I have 2 models:

class Product
  ...
  has_and_belongs_to_many :lists
end

class List
  ...
  has_and_belongs_to_many :products
end

When a new Product is created, I associate it with one or more Lists. However, I also need to update the Product attributes daily without losing the List reference info (I'm ok that validation on Product doesn't run).

One way, without using batch, is to call find_or_initialize_by on Product and update the attributes. But doing so for 10K-1M+ records is very time consuming.

Another way, which uses batch insert, is to do Product.delete_all and then Product.collection.insert(...), but that creates new product_ids and the relationship with List is no longer maintained.

Is there a way to do batch update or upsert in this example?

like image 356
netwire Avatar asked Aug 28 '14 13:08

netwire


2 Answers

MongoDB 2.6 supports the update command, see http://docs.mongodb.org/manual/reference/command/update/

Mongoid.default_session gives you corresponding access to the command method at the MongoDB database level. The following example shows how to construct and issue a bulk update command that should answer your question. The output shows the 10 lists, comparison of individual versus bulk-update run times, a dump of the Product collection showing updated stock counts, and the structure of the bulk update command. Even in this simple demo, there is significant performance improvement with bulk updates versus individual updates.

The Ruby 'mongo' 1.x driver supports a fluent API for bulk writes, this will also be supported by the new Ruby 'mongo' 2.0 driver that will eventually find its way into Mongoid. Mongoid v3+ currently uses the Moped driver which does not have a fluent API for bulk write. But as demonstrated here, it is fairly straightforward to construct and directly issue a bulk write (bulk update) command in Mongoid/Moped via the Session#command method.

Hope that this helps.

app/models/product.rb

class Product
  include Mongoid::Document
  field :name, type: String
  field :stock, type: Integer, default: 0
  has_and_belongs_to_many :lists, index: true
end

app/models/list.rb

class List
  include Mongoid::Document
  field :name, type: String
  has_and_belongs_to_many :products
end

test/unit/product_test.rb

require 'test_helper'
require 'benchmark'
require 'pp'

class ProductTest < ActiveSupport::TestCase
  def setup
    @session = Mongoid.default_session
    @session.drop
  end
  test '0. mongoid version' do
    puts "\nMongoid::VERSION:#{Mongoid::VERSION}\nMoped::VERSION:#{Moped::VERSION}"
  end
  def individual_updates(list_ids, repetitions)
    repetitions.times do
      list_ids.each_with_index do |list_id, index|
        Product.where(:list_ids => list_id).update_all({'$inc' => {'stock' => index}})
      end
    end
  end
  def bulk_command(list_ids, repetitions)
    updates = []
    repetitions.times do
      list_ids.each_with_index do |list_id, index|
        updates << {'q' => {'list_ids' => list_id}, 'u' => {'$inc' => {'stock' => index}}, 'multi' => true}
      end
    end
    { update: Product.collection_name.to_s, updates: updates, ordered: false }
  end
  def bulk_updates(list_ids, repetitions)
    @session.command(bulk_command(list_ids, repetitions))
  end
  test 'x' do
    puts
    [
      ['ASUS MeMO Pad FHD 10', ['ASUS', 'tablet', 'Android']],
      ['Apple iPad Air Wi-Fi + Cellular 128GB - Silver', ['Apple', 'tablet', 'iOS']],
      ['Apple iPad mini with Retina display Wi-Fi + Cellular 128GB - Silver', ['Apple', 'tablet', 'iOS']],
      ['Apple iPhone 5c 32GB Green', ['Apple', 'phone', 'iOS']],
      ['Apple iPhone 5s 64GB Space Gray', ['Apple', 'phone', 'iOS']],
      ['LG G Pad 8.3 Tablet', ['LG', 'tablet', 'Android']],
      ['LG Google Nexus 5 White', ['LG', 'phone', 'Android']],
      ['Microsoft Surface 7ZR-00001', ['Microsoft', 'tablet', 'Windows 8 RT']],
      ['Samsung Galaxy S4 I9500', ['Samsung', 'phone', 'Android']],
      ['Samsung Galaxy Tab S 8.4', ['Samsung', 'tablet', 'Android']]
    ] .each do |product_name, list_names|
      product = Product.create(name: product_name)
      list_names.each do |list_name|
        list = List.where(name: list_name).first_or_create
        list.products << product
      end
    end
    list_names = List.all.to_a.collect(&:name).sort.uniq
    p list_names
    list_ids = list_names.collect{|list_name| List.where(name: list_name).first.id}
    assert(list_ids.count > 0)

    Benchmark.bm(20) do |x|
      x.report('individual updates') { individual_updates(list_ids, 100) }
      x.report('bulk updates') { bulk_updates(list_ids, 100) }
    end

    pp Product.all.to_a

    db_command = bulk_command(list_ids, 1)
    assert(db_command[:updates].size > 0)
    pp db_command
  end
end

rake test

Run options:

# Running tests:

[1/2] ProductTest#test_0._mongoid_version
Mongoid::VERSION:3.1.6
Moped::VERSION:1.5.2
[2/2] ProductTest#test_x
["ASUS", "Android", "Apple", "LG", "Microsoft", "Samsung", "Windows 8 RT", "iOS", "phone", "tablet"]
                           user     system      total        real
individual updates     0.420000   0.070000   0.490000 (  0.489579)
bulk updates           0.060000   0.000000   0.060000 (  0.180926)
[#<Product _id: 5408b72b7f11bad1ca000001, name: "ASUS MeMO Pad FHD 10", stock: 2000, list_ids: ["5408b72c7f11bad1ca000002", "5408b72c7f11bad1ca000003", "5408b72c7f11bad1ca000004"]>,
 #<Product _id: 5408b72c7f11bad1ca000005, name: "Apple iPad Air Wi-Fi + Cellular 128GB - Silver", stock: 3600, list_ids: ["5408b72c7f11bad1ca000006", "5408b72c7f11bad1ca000003", "5408b72c7f11bad1ca000007"]>,
 #<Product _id: 5408b72c7f11bad1ca000008, name: "Apple iPad mini with Retina display Wi-Fi + Cellular 128GB - Silver", stock: 3600, list_ids: ["5408b72c7f11bad1ca000006", "5408b72c7f11bad1ca000003", "5408b72c7f11bad1ca000007"]>,
 #<Product _id: 5408b72c7f11bad1ca000009, name: "Apple iPhone 5c 32GB Green", stock: 3400, list_ids: ["5408b72c7f11bad1ca000006", "5408b72c7f11bad1ca00000a", "5408b72c7f11bad1ca000007"]>,
 #<Product _id: 5408b72c7f11bad1ca00000b, name: "Apple iPhone 5s 64GB Space Gray", stock: 3400, list_ids: ["5408b72c7f11bad1ca000006", "5408b72c7f11bad1ca00000a", "5408b72c7f11bad1ca000007"]>,
 #<Product _id: 5408b72c7f11bad1ca00000c, name: "LG G Pad 8.3 Tablet", stock: 2600, list_ids: ["5408b72c7f11bad1ca00000d", "5408b72c7f11bad1ca000003", "5408b72c7f11bad1ca000004"]>,
 #<Product _id: 5408b72c7f11bad1ca00000e, name: "LG Google Nexus 5 White", stock: 2400, list_ids: ["5408b72c7f11bad1ca00000d", "5408b72c7f11bad1ca00000a", "5408b72c7f11bad1ca000004"]>,
 #<Product _id: 5408b72c7f11bad1ca00000f, name: "Microsoft Surface 7ZR-00001", stock: 3800, list_ids: ["5408b72c7f11bad1ca000010", "5408b72c7f11bad1ca000003", "5408b72c7f11bad1ca000011"]>,
 #<Product _id: 5408b72c7f11bad1ca000012, name: "Samsung Galaxy S4 I9500", stock: 2800, list_ids: ["5408b72c7f11bad1ca000013", "5408b72c7f11bad1ca00000a", "5408b72c7f11bad1ca000004"]>,
 #<Product _id: 5408b72c7f11bad1ca000014, name: "Samsung Galaxy Tab S 8.4", stock: 3000, list_ids: ["5408b72c7f11bad1ca000013", "5408b72c7f11bad1ca000003", "5408b72c7f11bad1ca000004"]>]
{:update=>"products",
 :updates=>
  [{"q"=>{"list_ids"=>"5408b72c7f11bad1ca000002"},
    "u"=>{"$inc"=>{"stock"=>0}},
    "multi"=>true},
   {"q"=>{"list_ids"=>"5408b72c7f11bad1ca000004"},
    "u"=>{"$inc"=>{"stock"=>1}},
    "multi"=>true},
   {"q"=>{"list_ids"=>"5408b72c7f11bad1ca000006"},
    "u"=>{"$inc"=>{"stock"=>2}},
    "multi"=>true},
   {"q"=>{"list_ids"=>"5408b72c7f11bad1ca00000d"},
    "u"=>{"$inc"=>{"stock"=>3}},
    "multi"=>true},
   {"q"=>{"list_ids"=>"5408b72c7f11bad1ca000010"},
    "u"=>{"$inc"=>{"stock"=>4}},
    "multi"=>true},
   {"q"=>{"list_ids"=>"5408b72c7f11bad1ca000013"},
    "u"=>{"$inc"=>{"stock"=>5}},
    "multi"=>true},
   {"q"=>{"list_ids"=>"5408b72c7f11bad1ca000011"},
    "u"=>{"$inc"=>{"stock"=>6}},
    "multi"=>true},
   {"q"=>{"list_ids"=>"5408b72c7f11bad1ca000007"},
    "u"=>{"$inc"=>{"stock"=>7}},
    "multi"=>true},
   {"q"=>{"list_ids"=>"5408b72c7f11bad1ca00000a"},
    "u"=>{"$inc"=>{"stock"=>8}},
    "multi"=>true},
   {"q"=>{"list_ids"=>"5408b72c7f11bad1ca000003"},
    "u"=>{"$inc"=>{"stock"=>9}},
    "multi"=>true}],
 :ordered=>false}
Finished tests in 1.334821s, 1.4983 tests/s, 1.4983 assertions/s.
2 tests, 2 assertions, 0 failures, 0 errors, 0 skips
like image 56
Gary Murakami Avatar answered Oct 05 '22 13:10

Gary Murakami


Probably Gary's answer worked. But I found in mongoid 5.x and 6.x which uses the standard mongodb ruby driver, it is possible to do Model.mongo_client.bulk_write([op1, op2, ...], options). See

  • https://docs.mongodb.com/v3.2/reference/method/db.collection.bulkWrite/
  • https://api.mongodb.com/ruby/2.4.0/Mongo/Collection.html#bulk_write-instance_method

Needed both documents to make sense of constructing the call. e.g

  update = TestCase.collection.bulk_write(
    [
      replace_one: {
        filter: {},
        update: {},
        upsert: true
      },
      update_one: { ... },
      ...
    ],
    ordered: false
  )
like image 23
akostadinov Avatar answered Oct 05 '22 13:10

akostadinov