Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a pg_trgm index using SQLAlchemy for Scrapy?

I am using Scrapy to scrape data from a web forum. I am storing this data in a PostgreSQL database using SQLAlchemy. The table and columns create fine, however, I am not able to have SQLAlchemy create an index on one of the columns. I am trying to create a trigram index (pg_trgm) using gin.

The Postgresql code that would create this index is:

CREATE INDEX description_idx ON table USING gin (description gin_trgm_ops);

The SQLAlchemy code I have added to my models.py file is:

desc_idx = Index('description_idx', text("description gin_trgm_ops"), postgresql_using='gin')

I have added this line to my models.py but when I check in postgresql, the index was never created.

Below are my full models.py and pipelines.py files. Am I going about this all wrong??

Any help would be greatly appreciated!!

models.py:

from sqlalchemy import create_engine, Column, Integer, String, DateTime, Index, text
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.engine.url import URL
import settings

DeclarativeBase = declarative_base()
def db_connect():
return create_engine(URL(**settings.DATABASE))

def create_forum_table(engine):
    DeclarativeBase.metadata.create_all(engine)    


class forumDB(DeclarativeBase):
    __tablename__ = "table"

    id = Column(Integer, primary_key=True)
    title = Column('title', String)
    desc = Column('description', String, nullable=True)
    desc_idx = Index('description_idx', text("description gin_trgm_ops"), postgresql_using='gin')

pipelines.py

from scrapy.exceptions import DropItem
from sqlalchemy.orm import sessionmaker
from models import forumDB, db_connect, create_forum_table


class ScrapeforumToDB(object):
def __init__(self):
    engine = db_connect()
    create_forum_table(engine)
    self.Session = sessionmaker(bind=engine)

def process_item(self, item, spider):
    session = self.Session()
    forumitem = forumDB(**item)

    try:
        session.add(forumitem)
        session.commit()
    except:
        session.rollback()
        raise
    finally:
        session.close()

    return item
like image 401
Mike H Avatar asked Apr 03 '16 18:04

Mike H


Video Answer


2 Answers

The proper way to reference an Operator Class in SQLAlchemy (such as gin_trgm_ops) is to use the postgresql_ops parameter. This will also allow tools like alembic to understand how use it when auto-generating migrations.

Index('description_idx',
      'description', postgresql_using='gin',
      postgresql_ops={
          'description': 'gin_trgm_ops',
      })
like image 136
Stephen Fuhry Avatar answered Sep 19 '22 15:09

Stephen Fuhry


Since the Index definition uses text expression it has no references to the Table "table", which has been implicitly created by the declarative class forumDB. Compare that to using a Column as expression, or some derivative of it, like this:

Index('some_index_idx', forumDB.title)

In the above definition the index will know about the table and the other way around.

What this means in your case is that the Table "table" has no idea that such an index exists. Adding it as an attribute of the declarative class is the wrong way to do it. It should be passed to the implicitly created Table instance. The attribute __table_args__ is just for that:

class forumDB(DeclarativeBase):

    __tablename__ = "table"
    # Note: This used to use `text('description gin_trgm_ops')` instead of the
    # `postgresql_ops` parameter, which should be used.
    __table_args__ = (
        Index('description_idx', "description",
              postgresql_ops={"description": "gin_trgm_ops"},
              postgresql_using='gin'),
    )

    id = Column(Integer, primary_key=True)
    title = Column('title', String)
    desc = Column('description', String, nullable=True)

With the modification in place, a call to create_forum_table(engine) resulted in:

> \d "table"
                                 Table "public.table"
   Column    |       Type        |                     Modifiers                      
-------------+-------------------+----------------------------------------------------
 id          | integer           | not null default nextval('table_id_seq'::regclass)
 title       | character varying | 
 description | character varying | 
Indexes:
    "table_pkey" PRIMARY KEY, btree (id)
    "description_idx" gin (description gin_trgm_ops)
like image 40
Ilja Everilä Avatar answered Sep 21 '22 15:09

Ilja Everilä