Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing an object created with SubFactory and LazyAttribute to a RelatedFactory in factory_boy

I am using factory.LazyAttribute within a SubFactory call to pass in an object, created in the factory_parent. This works fine.

But if I pass the object created to a RelatedFactory, LazyAttribute can no longer see the factory_parent and fails.

This works fine:

class OKFactory(factory.DjangoModelFactory):
    class = Meta:
        model = Foo
        exclude = ['sub_object']

    sub_object = factory.SubFactory(SubObjectFactory)

    object = factory.SubFactory(ObjectFactory,
        sub_object=factory.LazyAttribute(lambda obj: obj.factory_parent.sub_object))

The identical call to LazyAttribute fails here:

class ProblemFactory(OKFactory):
    class = Meta:
        model = Foo
        exclude = ['sub_object', 'object']

    sub_object = factory.SubFactory(SubObjectFactory)

    object = factory.SubFactory(ObjectFactory,
        sub_object=factory.LazyAttribute(lambda obj: obj.factory_parent.sub_object))

    another_object = factory.RelatedFactory(AnotherObjectFactory, 'foo', object=object)

The identical LazyAttribute call can no longer see factory_parent, and can only access AnotherObject values. LazyAttribute throws the error:

AttributeError: The parameter sub_object is unknown. Evaluated attributes are...[then lists all attributes of AnotherObjectFactory]

Is there a way round this?

I can't just put sub_object=sub_object into the ObjectFactory call, ie:

    sub_object = factory.SubFactory(SubObjectFactory)
    object = factory.SubFactory(ObjectFactory, sub_object=sub_object)

because if I then do:

    object2 = factory.SubFactory(ObjectFactory, sub_object=sub_object)

a second sub_object is created, whereas I need both objects to refer to the same sub_object. I have tried SelfAttribute to no avail.

like image 997
Chris Avatar asked Oct 12 '15 19:10

Chris


2 Answers

I think you can leverage the ability to override parameters passed in to the RelatedFactory to achieve what you want.

For example, given:

class MyFactory(OKFactory):

    object = factory.SubFactory(MyOtherFactory)
    related = factory.RelatedFactory(YetAnotherFactory)  # We want to pass object in here

If we knew what the value of object was going to be in advance, we could make it work with something like:

object = MyOtherFactory()
thing = MyFactory(object=object, related__param=object)

We can use this same naming convention to pass the object to the RelatedFactory within the main Factory:

class MyFactory(OKFactory):

    class Meta:
        exclude = ['object']

    object = factory.SubFactory(MyOtherFactory)
    related__param = factory.SelfAttribute('object')
    related__otherrelated__param = factory.LazyAttribute(lambda myobject: 'admin%d_%d' % (myobject.level, myobject.level - 1))
    related = factory.RelatedFactory(YetAnotherFactory)  # Will be called with {'param': object, 'otherrelated__param: 'admin1_2'}
like image 200
rhunwicks Avatar answered Nov 04 '22 15:11

rhunwicks


I solved this by simply calling factories within @factory.post_generation. Strictly speaking this isn't a solution to the specific problem posed, but I explain below in great detail why this ended up being a better architecture. @rhunwick's solution does genuinely pass a SubFactory(LazyAttribute('')) to RelatedFactory, however restrictions remained that meant this was not right for my situation.

We move the creation of sub_object and object from ProblemFactory to ObjectWithSubObjectsFactory (and remove the exclude clause), and add the following code to the end of ProblemFactory.

@factory.post_generation
def post(self, create, extracted, **kwargs):
    if not create:
         return  # No IDs, so wouldn't work anyway

    object = ObjectWithSubObjectsFactory()
    sub_object_ids_by_code = dict((sbj.name, sbj.id) for sbj in object.subobject_set.all())

    # self is the `Foo` Django object just created by the `ProblemFactory` that contains this code.
    for another_obj in self.anotherobject_set.all():
        if another_obj.name == 'age_in':
            another_obj.attribute_id = sub_object_ids_by_code['Age']
            another_obj.save()
        elif another_obj.name == 'income_in':
            another_obj.attribute_id = sub_object_ids_by_code['Income']
            another_obj.save()

So it seems RelatedFactory calls are executed before PostGeneration calls.

The naming in this question is easier to understand, so here is the same solution code for that sample problem:

The creation of dataset, column_1 and column_2 are moved into a new factory DatasetAnd2ColumnsFactory, and the code below is then added to the end of FunctionToParameterSettingsFactory.

@factory.post_generation
def post(self, create, extracted, **kwargs):
    if not create:
         return

    dataset = DatasetAnd2ColumnsFactory()
    column_ids_by_name = 
        dict((column.name, column.id) for column in dataset.column_set.all())

    # self is the `FunctionInstantiation` Django object just created by the `FunctionToParameterSettingsFactory` that contains this code.
    for parameter_setting in self.parametersetting_set.all():
        if parameter_setting.name == 'age_in':
            parameter_setting.column_id = column_ids_by_name['Age']
            parameter_setting.save()
        elif parameter_setting.name == 'income_in':
            parameter_setting.column_id = column_ids_by_name['Income']
            parameter_setting.save()

I then extended this approach passing in options to configure the factory, like this:

whatever = WhateverFactory(options__an_option=True, options__another_option=True)

Then this factory code detected the options and generated the test data required (note the method is renamed to options to match the prefix on the parameter names):

@factory.post_generation
def options(self, create, not_used, **kwargs):

    # The standard code as above

    if kwargs.get('an_option', None):
        # code for custom option 'an_option'
    if kwargs.get('another_option', None):
        # code for custom option 'another_option'

I then further extended this. Because my desired models contained self joins, my factory is recursive. So for a call such as:

whatever = WhateverFactory(options__an_option='xyz',
                           options__an_option_for_a_nested_whatever='abc')

Within @factory.post_generation I have:

class Meta:
    model = Whatever
# self is the top level object being generated

@factory.post_generation
def options(self, create, not_used, **kwargs):

    # This generates the nested object
    nested_object = WhateverFactory(
        options__an_option=kwargs.get('an_option_for_a_nested_whatever', None))

    # then join nested_object to self via the self join
    self.nested_whatever_id = nested_object.id

Some notes you do not need to read as to why I went with this option rather than @rhunwicks's proper solution to my question above. There were two reasons.

The thing that stopped me experimenting with it was that the order of RelatedFactory and post-generation is not reliable - apparently unrelated factors affect it, presumably a consequence of lazy evaluation. I had errors where a set of factories would suddenly stop working for no apparent reason. Once was because I renamed the variables RelatedFactory were assigned to. This sounds ridiculous but I tested it to death (and posted here) but there is no doubt - renaming the variables reliably switched the sequence of RelatedFactory and post-gen execution. I still assumed this was some oversight on my behalf until it happened again for some other reason (which I never managed to diagnose).

Secondly I found the declarative code confusing, inflexible and hard to re-factor. It isn't straightforward to pass different configurations during instantiation so that the same factory can be used for different variations of test data, meaning I had to repeat code, object needs adding to a Factory Meta.exclude list - sounds trivial but when you've pages of code generating data it was a reliable error. As a developer you'd have to pass over several factories several times to understand the control flow. Generation code would be spread between the declarative body, until you'd exhausted these tricks, then the rest would go in post-generation or get very convoluted. A common example for me is a triad of interdependent models (eg, a parent-children category structure or dataset/attributes/entities) as a foreign key of another triad of inter-dependent objects (eg, models, parameter values, etc, referring to other models' parameter values). A few of these types of structures, especially if nested, quickly become unmanagable.

I realize it isn't really in the spirit of factory_boy, but putting everything into post-generation solved all these problems. I can pass in parameters, so the same single factory serves all my composite model test data requirements and no code is repeated. The sequence of creation is easy to see immediately, obvious and completely reliable, rather than depending on confusing chains of inheritance and overriding and subject to some bug. The interactions are obvious so you don't need to digest the whole thing to add some functionality, and different areas of funtionality are grouped in the post-generation if clauses. There's no need to exclude working variables and you can refer to them for the duration of the factory code. The unit test code is simplified, because describing the functionality goes in parameter names rather than Factory class names - so you create data with a call like WhateverFactory(options__create_xyz=True, options__create_abc=True.., rather than WhateverCreateXYZCreateABC..(). This makes a nice division of responsibilities quite clean to code.

like image 26
Chris Avatar answered Nov 04 '22 16:11

Chris