Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Searching in multiple fields respecting the row order

I have a model like the following:

class Foo(models.Model):
    fruit = models.CharField(max_length=10)
    stuff = models.CharField(max_length=10)
    color = models.CharField(max_length=10)
    owner = models.CharField(max_length=20)
    exists = models.BooleanField()
    class Meta:
        unique_together = (('fruit', 'stuff', 'color'), )

It is populated with some data:

fruit  stuff  color   owner  exists
Apple  Table   Blue     abc    True
 Pear   Book    Red     xyz   False
 Pear  Phone  Green     xyz   False
Apple  Phone   Blue     abc    True
 Pear  Table  Green     abc    True

I need to merge/join this with a collection (not a queryset):

[('Apple', 'Table', 'Blue'), ('Pear', 'Phone', 'Green')]

So basically rows 0 and 2 should return when I search this model with this list of tuples.

Currently my workaround is to read Foo.objects.all() into a DataFrame and do a merge with the list of tuples and get the ID's to pass to Foo.objects.filter(). I also tried iterating over the list and calling Foo.object.get() on each tuple but it is very slow. The list is quite big.

When I tried chaining Q's as suggested by the current answers, it threw an OperationalError (too many SQL variables).

My main goal is the following:

As it can be seen from the model these three fields together form my primary key. The table contains around 15k entries. When I get data from another source I need to check if the data is already in my table and create/update/delete accordingly (new data may contain up to 15k entries). Is there a clean and efficient way to check if these records are already in my table?

Note: The list of tuples does not have to be in that shape. I can modify it, turn it into another data structure or transpose it.

like image 310
ayhan Avatar asked Nov 24 '17 22:11

ayhan


2 Answers

You have ('fruit', 'stuff', 'color') field unique together

So if your search tuple is ('Apple', 'Table', 'Blue') and we concatenate it then also it will be a unique string

f = [('Apple', 'Table', 'Blue'), ('Pear', 'Phone', 'Green')]
c = [''.join(w) for w in f]
# Output: ['AppleTableBlue', 'PearPhoneGreen']

So we can filter queryset on annotations and make use of Concat.

Foo.objects.annotate(u_key=Concat('fruit', 'stuff', 'color', output_field=CharField())).filter(u_key__in=c)
# Output: <QuerySet [<Foo: #0row >, <Foo: #2row>]>

This will work for tuple and list

Transpose case

case 1:

If input is list of 2 tuple:

[('Apple', 'Table', 'Blue'), ('Pear', 'Phone', 'Green')]

after transpose input will be:

transpose_input = [('Apple', 'Pear'), ('Table', 'Phone'), ('Blue', 'Green')]

We can easily identify by counting each_tuple_size and input_list_size that the input is transposed. so we can use zip to transpose it again and the above solution will work as expected.

if each_tuple_size == 2 and input_list_size == 3:
    transpose_again = list(zip(*transpose_input))
    #  use *transpose_again* variable further

case 2:

If input is list of 3 tuple:

[('Apple', 'Table', 'Blue'), ('Pear', 'Phone', 'Green'), ('Pear', 'Book', 'Red')]

After transpose input will be:

transpose_input = [('Apple', 'Pear', 'Pear'), ('Table', 'Phone', 'Book'), ('Blue', 'Green', 'Red')]

So it is impossible to identify that the input is transposed for every n*n matrix and above solution will Fail

like image 64
Satendra Avatar answered Nov 09 '22 23:11

Satendra


If you know these fields constitute your natural key and you have to do heavy querying on them, add this natural key as a proper field and take measures to maintain it:

class FooQuerySet(models.QuerySet):
    def bulk_create(self, objs, batch_size=None):
        objs = list(objs)
        for obj in objs:
            obj.natural_key = Foo.get_natural_key(obj.fruit, obj.stuff, obj.color)
        return super(FooQuerySet, self).bulk_create(objs, batch_size=batch_size)

    # you might override update(...) with proper F and Value expressions, 
    # but I assume the natural key does not change

class FooManager(models.Manager):
    def get_queryset(self):
        return FooQuerySet(self.model, using=self._db)

class Foo(models.Model):
    NK_SEP = '|||'  # sth unlikely to occur in the other fields

    fruit = models.CharField(max_length=10)
    stuff = models.CharField(max_length=10)
    color = models.CharField(max_length=10)
    natural_key = models.CharField(max_length=40, unique=True, db_index=True)

    @staticmethod
    def get_natural_key(*args):
        return Foo.NK_SEP.join(args) 

    def save(self, *args, **kwargs):
        self.natural_key = Foo.get_natural_key(self.fruit, self.stuff, self.color)
        Super(Foo, self).save(*args, **kwargs)

    objects = FooManager()

    class Meta:
        unique_together = (('fruit', 'stuff', 'color'), )

Now you can query:

from itertools import starmap

lst = [('Apple', 'Table', 'Blue'), ('Pear', 'Phone', 'Green')]
existing_foos = Foo.objects.filter(natural_key__in=list(starmap(Foo.get_natural_key, lst)))

And batch create:

Foo.objects.bulk_create(
    [
        Foo(fruit=x[0], stuff=x[1], color=x[2]) 
        for x in lst 
        if x not in set(existing_foos.values_list('fruit', 'stuff', 'color'))
    ]
)
like image 27
user2390182 Avatar answered Nov 10 '22 00:11

user2390182