Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SAWarning when querying with SQLAlchemy into pandas df

I'm querying my SQLAlchemy-mapped star schema directly into a pandas DataFrame and am getting an annoying SAWarning from pandas that I'd like to address. Here's a simplified version.

class School(Base):
__tablename__ = 'DimSchool'

id = Column('SchoolKey', Integer, primary_key=True)
name = Column('SchoolName', String)
district = Column('SchoolDistrict', String)


class StudentScore(Base):
__tablename__ = 'FactStudentScore'

StudentKey = Column('StudentKey', Integer,    ForeignKey('DimStudent.StudentKey'), primary_key = True)
SchoolKey = Column('SchoolKey', Integer, ForeignKey('DimSchool.SchoolKey'), primary_key = True)
PointsPossible = Column('PointsPossible', Integer)
PointsReceived = Column('PointsReceived', Integer)

student = relationship("Student", backref='studentscore')
school = relationship("School", backref='studentscore')

I query the date with statements like this:

standard = session.query(StudentdScore, School).\
join(School).filter(School.name.like('%Dever%'))

testdf = pd.read_sql(sch.statement, sch.session.bind)   

And then get this warning:

SAWarning: Column 'SchoolKey' on table <sqlalchemy.sql.selectable.Select at 0x1ab7abe0; Select object> being replaced by Column('SchoolKey', Integer(), table=<Select object>, primary_key=True, nullable=False), which has the same key.  Consider use_labels for select() statements.

I get this error for every additional table (class) included in my join. The message always refers to the foreign key.

Anyone else encounter this error and determine root cause? Or have ya'll just been ignoring it as well?

EDIT/UPDATE:

Handling Duplicate Columns in Pandas DataFrame constructor from SQLAlchemy Join

These guys seem to be talking about a related issue, but they use a different pandas method to bring the dataframe in and want to keep duplicates, not drop them. Anyone have thoughts on how to implement a similar styled function, but drop the duplicates as the query comes back?

like image 647
AZhao Avatar asked Jun 22 '15 22:06

AZhao


1 Answers

For what it's worth, here's my limited answer.

For the following SAWarning:

SAWarning: Column 'SchoolKey' on table <sqlalchemy.sql.selectable.
Select at 0x1ab7abe0; Select object> being replaced by Column('SchoolKey', Integer(), table=<Select object>, primary_key=True, nullable=False), which has the same key.  
Consider use_labels for select() statements.

It's really telling you that there are columns with duplicate names, even if the columns are in separate tables. In most cases this is innocuous as the columns are simple the join keys. However, I have encountered cases where the tables contain duplicately named by distincted populated columns (ie a teacher table with name column and student table with name column). In these cases, rename the pandas dataframe with an approach like this, or rename the underlying database tables.

I'll keep an eye out on this question and if anyone has a better one I'll gladly award the answer.

like image 117
AZhao Avatar answered Nov 01 '22 03:11

AZhao