Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PonyORM: What is the most efficient way to add new items to a pony database without knowing which items already exist?

Forgive me if this is an obvious question but I'm new to pony and databases in general and didn't find the right part of the documentation that answers this question.

I'm trying to create a database with companies and the locations where those companies have offices. This is a many-to-many relationship since each company is in multiple locations and each location can be host to multiple companies. I'm defining my entities as such:

from pony import orm

class Company(db.Entity):
    '''A company entry in database'''
    name = orm.PrimaryKey(str)
    locations = orm.Set('Location')

class Location(db.Entity):
    '''A location for a company'''
    name = orm.PrimaryKey(str)
    companies = orm.Set('Company')

Ideally, I'd like to be able to write a function that adds a company to the database while also adding the list of locations where that company exists while also being sure to add new location instances if they don't already exist. I can quickly think of two ways to do so.

First would be to try to enter the location even if it exists and handle the exception:

@orm.db_session
def add_company(name, locations):
    loc_entities = []
    for l in locations:
        try:
            loc = Location[l]
        except orm.core.ObjectNotFound:
            loc = Location(name=l)
        else:
            loc_entities.append(loc)
    comp = Company(name=name, locations=loc_entities)

Second would be to query the database and ask whether the locations exist yet:

@orm.db_session
def add_company2(name, locations):
    old_loc_entities = orm.select(l for l in Location if l.name in locations)[:]
    old_locations = [l.name for l in old_loc_entities]
    new_locations = set(locations) - (set(locations) & set(old_locations))
    loc_entities = [Location(name=l) for l in new_locations] + old_loc_entities
    comp = Company(name=name, locations=loc_entities)

Of these two, I'd guess that the more pythonic way to do it would be to simply handle the exception but does this run into the N+1 problem? I'm noticing that by using the name as a primary key, I'm making a query every time I access the entity using an index. When I just let pony pick sequential ids, I don't seem to need to query. I haven't tested this with any large datasets yet so I haven't benchmarked yet.

like image 829
JeffP Avatar asked Oct 30 '16 08:10

JeffP


2 Answers

I'm noticing that by using the name as a primary key, I'm making a query every time I access the entity using an index. When I just let pony pick sequential ids, I don't seem to need to query.

Internally Pony caches sequential primary keys in the same way as a string primary keys, so I think there should be no difference. Each db_session have separate cache (which is called "identity map"). After an object is read, any access by primary key (or any other unique key) within the same db_session should return the same object directly from the identity map without issuing a new query. After the db_session is over, another access by the same key will issue a new query, because the object could be modified in the database by a concurrent transaction.

Regarding your approaches, I think both of them are valid. If a company have just a few location (say, around ten), I'd use the first approach, because it feels more pythonic to me. It is indeed causes N+1 query, but a query which retrieves an object by a primary key is very fast and easy to the server to execute. The code can be expressed a little more compact by using a get method:

@orm.db_session
def add_company(name, locations):
    loc_entities = [Location.get(name=l) or Location(name=l)
                    for l in locations]
    comp = Company(name=name, locations=loc_entities)

The second approach of retrieving all existing locations with a single query looks like a premature optimization to me, but if you create hundreds a companies per second, and each company has hundreds of locations, it may be used.

like image 66
Alexander Kozlovsky Avatar answered Sep 27 '22 23:09

Alexander Kozlovsky


I know this as the "get or create" pattern, always had to implement it no matter the ORM or language.

This is my "get or create" for Pony.

class GetMixin():
    @classmethod
    def get_or_create(cls, params):
        o = cls.get(**params)
        if o:
            return o
        return cls(**params)


class Location(db.Entity, GetMixin):
    '''A location for a company'''
    name = orm.PrimaryKey(str)
    companies = orm.Set('Company')

The Mixin is explained on the docs.

Then your code will look like this:

@orm.db_session
def add_company(name, locations):
    loc_entities = [Location.get_or_create(name=l) for l in locations]
    comp = Company(name=name, locations=loc_entities)
like image 34
MGP Avatar answered Sep 27 '22 22:09

MGP