Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQLAlchemy, prevent duplicate rows

I'm wondering if it's possible to prevent committing duplicates to the database. For example, presume there is a class as follows

class Employee(Base):
   id = Column(Integer, primary_key=True)
   name = Column(String)

If I were to make a series of these objects,

employee1 = Employee(name='bob')
employee2 = Employee(name='bob')

session.add_all([employee1, employee2])
session.commit()

I would like only a single row to be added to the database, and employee1 and employee2 to point to the same object in memory (if possible).

Is there functionality within SQLAlchemy to accomplish this? Or would I need to ensure duplicates don't exist programmatically?

like image 755
nven Avatar asked Jun 13 '17 01:06

nven


3 Answers

An alternate get_or_create() solution:

from sqlalchemy.orm.exc import NoResultFound
# ...

def get_or_create(self, model, **kwargs):
    """
    Usage:
    class Employee(Base):
        __tablename__ = 'employee'
        id = Column(Integer, primary_key=True)
        name = Column(String, unique=True)

    get_or_create(Employee, name='bob')
    """
    instance = get_instance(model, **kwargs)
    if instance is None:
        instance = create_instance(model, **kwargs)
    return instance


def create_instance(model, **kwargs):
    """create instance"""
    try:
        instance = model(**kwargs)
        sess.add(instance)
        sess.flush()
    except Exception as msg:
        mtext = 'model:{}, args:{} => msg:{}'
        log.error(mtext.format(model, kwargs, msg))
        sess.rollback()
        raise(msg)
    return instance


def get_instance(self, model, **kwargs):
    """Return first instance found."""
    try:
        return sess.query(model).filter_by(**kwargs).first()
    except NoResultFound:
        return

like image 160
farax Avatar answered Oct 23 '22 19:10

farax


You could create a class method to get or create an Employee -- get it if it exists, otherwise create:

@classmethod
def get_or_create(cls, name):
    exists = db.session.query(Employee.id).filter_by(name=name).scalar() is not None
    if exists:
        return db.session.query(Employee).filter_by(name=name).first()
    return cls(name=name)


employee1 = Employee(name='bob')
db.session.add(employee1)
employee2 = Employee(name='bob')

employee1 == employee2  # False


bob1 = Employee.get_or_create(name='bob')
if bob1 not in db.session:
    db.session.add(bob1)

len(add_to_session) # 1

bob2 = Employee.get_or_create(name='bob')
if bob2 not in db.session:
    db.session.add(bob2)

len(add_to_session) # 1

bob1 == bob2  # True
like image 25
Zev Averbach Avatar answered Oct 23 '22 17:10

Zev Averbach


There are at least 2 approaches:

  • The database approach: create a relevant primary key; with SQLAlchemy you would define e.g. based on your minimalistic example name = Column('First Name', String(20), primary_key=True)
  • The coding approach: check whether the attribute, set of attributes already exists in the table, otherwise create it. See relevant code examples here.

In terms of performance, I believe the database approach is better. It also is the one which makes more sense.

like image 43
tagoma Avatar answered Oct 23 '22 18:10

tagoma