Forgive me if this is an obvious question but I'm new to pony and databases in general and didn't find the right part of the documentation that answers this question.
I'm trying to create a database with companies and the locations where those companies have offices. This is a many-to-many relationship since each company is in multiple locations and each location can be host to multiple companies. I'm defining my entities as such:
from pony import orm
class Company(db.Entity):
'''A company entry in database'''
name = orm.PrimaryKey(str)
locations = orm.Set('Location')
class Location(db.Entity):
'''A location for a company'''
name = orm.PrimaryKey(str)
companies = orm.Set('Company')
Ideally, I'd like to be able to write a function that adds a company to the database while also adding the list of locations where that company exists while also being sure to add new location instances if they don't already exist. I can quickly think of two ways to do so.
First would be to try to enter the location even if it exists and handle the exception:
@orm.db_session
def add_company(name, locations):
loc_entities = []
for l in locations:
try:
loc = Location[l]
except orm.core.ObjectNotFound:
loc = Location(name=l)
else:
loc_entities.append(loc)
comp = Company(name=name, locations=loc_entities)
Second would be to query the database and ask whether the locations exist yet:
@orm.db_session
def add_company2(name, locations):
old_loc_entities = orm.select(l for l in Location if l.name in locations)[:]
old_locations = [l.name for l in old_loc_entities]
new_locations = set(locations) - (set(locations) & set(old_locations))
loc_entities = [Location(name=l) for l in new_locations] + old_loc_entities
comp = Company(name=name, locations=loc_entities)
Of these two, I'd guess that the more pythonic way to do it would be to simply handle the exception but does this run into the N+1 problem? I'm noticing that by using the name as a primary key, I'm making a query every time I access the entity using an index. When I just let pony pick sequential ids, I don't seem to need to query. I haven't tested this with any large datasets yet so I haven't benchmarked yet.
I'm noticing that by using the name as a primary key, I'm making a query every time I access the entity using an index. When I just let pony pick sequential ids, I don't seem to need to query.
Internally Pony caches sequential primary keys in the same way as a string primary keys, so I think there should be no difference. Each db_session
have separate cache (which is called "identity map"). After an object is read, any access by primary key (or any other unique key) within the same db_session
should return the same object directly from the identity map without issuing a new query. After the db_session
is over, another access by the same key will issue a new query, because the object could be modified in the database by a concurrent transaction.
Regarding your approaches, I think both of them are valid. If a company have just a few location (say, around ten), I'd use the first approach, because it feels more pythonic to me. It is indeed causes N+1 query, but a query which retrieves an object by a primary key is very fast and easy to the server to execute. The code can be expressed a little more compact by using a get
method:
@orm.db_session
def add_company(name, locations):
loc_entities = [Location.get(name=l) or Location(name=l)
for l in locations]
comp = Company(name=name, locations=loc_entities)
The second approach of retrieving all existing locations with a single query looks like a premature optimization to me, but if you create hundreds a companies per second, and each company has hundreds of locations, it may be used.
I know this as the "get or create" pattern, always had to implement it no matter the ORM or language.
This is my "get or create" for Pony.
class GetMixin():
@classmethod
def get_or_create(cls, params):
o = cls.get(**params)
if o:
return o
return cls(**params)
class Location(db.Entity, GetMixin):
'''A location for a company'''
name = orm.PrimaryKey(str)
companies = orm.Set('Company')
The Mixin is explained on the docs.
Then your code will look like this:
@orm.db_session
def add_company(name, locations):
loc_entities = [Location.get_or_create(name=l) for l in locations]
comp = Company(name=name, locations=loc_entities)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With