I'm querying my SQLAlchemy-mapped star schema directly into a pandas DataFrame
and am getting an annoying SAWarning from pandas
that I'd like to address. Here's a simplified version.
class School(Base):
__tablename__ = 'DimSchool'
id = Column('SchoolKey', Integer, primary_key=True)
name = Column('SchoolName', String)
district = Column('SchoolDistrict', String)
class StudentScore(Base):
__tablename__ = 'FactStudentScore'
StudentKey = Column('StudentKey', Integer, ForeignKey('DimStudent.StudentKey'), primary_key = True)
SchoolKey = Column('SchoolKey', Integer, ForeignKey('DimSchool.SchoolKey'), primary_key = True)
PointsPossible = Column('PointsPossible', Integer)
PointsReceived = Column('PointsReceived', Integer)
student = relationship("Student", backref='studentscore')
school = relationship("School", backref='studentscore')
I query the date with statements like this:
standard = session.query(StudentdScore, School).\
join(School).filter(School.name.like('%Dever%'))
testdf = pd.read_sql(sch.statement, sch.session.bind)
And then get this warning:
SAWarning: Column 'SchoolKey' on table <sqlalchemy.sql.selectable.Select at 0x1ab7abe0; Select object> being replaced by Column('SchoolKey', Integer(), table=<Select object>, primary_key=True, nullable=False), which has the same key. Consider use_labels for select() statements.
I get this error for every additional table (class) included in my join. The message always refers to the foreign key.
Anyone else encounter this error and determine root cause? Or have ya'll just been ignoring it as well?
EDIT/UPDATE:
Handling Duplicate Columns in Pandas DataFrame constructor from SQLAlchemy Join
These guys seem to be talking about a related issue, but they use a different pandas method to bring the dataframe in and want to keep duplicates, not drop them. Anyone have thoughts on how to implement a similar styled function, but drop the duplicates as the query comes back?
For what it's worth, here's my limited answer.
For the following SAWarning:
SAWarning: Column 'SchoolKey' on table <sqlalchemy.sql.selectable.
Select at 0x1ab7abe0; Select object> being replaced by Column('SchoolKey', Integer(), table=<Select object>, primary_key=True, nullable=False), which has the same key.
Consider use_labels for select() statements.
It's really telling you that there are columns with duplicate names, even if the columns are in separate tables. In most cases this is innocuous as the columns are simple the join keys. However, I have encountered cases where the tables contain duplicately named by distincted populated columns (ie a teacher table with name column and student table with name column). In these cases, rename the pandas dataframe with an approach like this, or rename the underlying database tables.
I'll keep an eye out on this question and if anyone has a better one I'll gladly award the answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With