Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ActiveRecord seems to be validating unchanged child records unnecessarily

I have found a situation where ActiveRecord is validating child records seemingly unnecessarily. Apologies in advance for the length as this is quite complex.

This involves through associations that have previously been used but not changed in any way. It occurs on 3.2 through to recent master. I am not sure if it is a design decision that has led to unexpected behaviour, or a bug of some kind.

I have reduced a test case from the actual code as follows:

Models:

class A < ActiveRecord::Base
  belongs_to :b
  has_many :cs, :through => :b
  before_validation { puts "A" }
end

class B < ActiveRecord::Base
  has_many :as
  has_many :cs
  before_validation { puts "B" }
end

class C < ActiveRecord::Base
  belongs_to :b
  before_validation { puts "C" }
end

Migration:

class AddABC < ActiveRecord::Migration
  def change
    create_table :as do |t|
      t.references :b
    end

    create_table :bs do |t|
    end

    create_table :cs do |t|
      t.references :b
    end
  end
end

The reduced test case that triggers it is this when run on an empty database:

b = B.create!
c = C.create!
b.cs << c
a = A.new
a.b = b
a.cs.first
puts "X"
a.valid?

which gives output:

B
C
C
X
A
C

Which shows that validating an A validates its Cs.

Now having looked into this I am aware of the has_many :validate => false option, and using it, the problem goes away. But it seems to me there’s more going on here than this - bear with me.

The AR docs say:

:validate If false, don't validate the associated objects when saving the parent object. true by default.

But I find this confusing as this clearly cannot mean all records. It won’t validate the objects if I never get the association (remove a.cs.first from the code above), or I get it but never use it (replace with a.cs). This is because it goes through validate_collection_association in lib/active_record/autosave_association.rb which includes the code:

  def validate_collection_association(reflection)
    if association = association_instance_get(reflection.name)
      if records = associated_records_to_validate_or_save(association, new_record?, reflection.options[:autosave])
        records.each_with_index { |record, index| association_valid?(reflection, record, index) }
      end
    end
  end

It's all conditional on association_instance_get which fetches from the association cache. No cache means no records to validate.

I have tried to do a simpler has_many, by setting up just a B model which references A, but then I’ll need to create the B before the A, then A will no longer be a new record if I try to save it, and this code prevents the problem as the branch called will no longer be the first:

  def associated_records_to_validate_or_save(association, new_record, autosave)
    if new_record
      association && association.target
    elsif autosave
      association.target.find_all(&:changed_for_autosave?)
    else
      association.target.find_all(&:new_record?)
    end
  end

The only real explanation I can come up with for it only validating loaded records is because the intention of ActiveRecord here is to validate changed records only. Really I’d expect it to validate if and only if it’s going to save, and hence the default autosave option of only saving changed records should prevent a validation.

I have found a related ticket and the commit 27aa4dda7d89ce733 (not yet in any release I think) which makes a change but does not fix this specific issue from my testing. It does however contain the expression:

!record.persisted? || record.changed? || record.marked_for_destruction?

and if I add this condition to the innermost loop of validate_collection_association then the problem goes away, with the ActiveRecord tests still passing on my machine.

This has been a significant performance issue in my project because the model in question was only supposed to be validated in admin, where an unindexed field used in a custom validation was acceptable due to the rarity of it being saved and thus I judged that indexing it would be over-indexing (it wouldn’t just be one field). Obviously in most cases this over-validation would be far less serious, and it only seems to happen in quite a specific case, so this could be a bug.

So, while I’ve got a good idea as to what is happening, I’m not entirely sure what should be happening, which is why I haven’t filed this as an ActiveRecord ticket. Do you think this is a bug? Why is it working like this? What is the validate option really for? If this is a bug, can you explain why the code works this way, and why it is overreaching? What case would my code change to ActiveRecord above break?

like image 496
matthew.tuck Avatar asked Oct 27 '15 13:10

matthew.tuck


1 Answers

The reason this is happening is because the relationship between A and C is through B.

Before you assign a.b = b, a has no bs or cs.

If you assign a.b = b but you don't call a.cs, then a has no reason to try to load the associated cs. has_many only creates the cs convenience method, it does not call it for you. Here only a.b_id is set to b.id.

Once you call a.cs, a will look for associated cs objects through b since b is available. It will find those objects and add them as children to a.

I see your point, that technically, there is nothing to do in this particular case in this particular schema for the cs, but I can see why ActiveRecord is checking. These objects, as far as it is concerned, are children of a and children records are validated unless specifically told not to through validate: false.

In this case, a is a child of b, so a is not required to validate it.

In general, parents will cause their associated children to be validated. Children do not have to validate their parents.

like image 178
Bassel Samman Avatar answered Jan 15 '23 15:01

Bassel Samman