Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Renaming columns when querying with SQLAlchemy into Pandas DataFrame

Is there a way to retain the SqlAlchemy attribute names when you query the data into a pandas dataframe?

Here's a simple mapping of my database. For the school table, I've renamed 'SchoolDistrict', the DB name, to a shorter 'district'. I'm several layers removed from the DBA, so changing them in the source isn't feasible.

class School(Base):
    __tablename__ = 'DimSchool'

    id = Column('SchoolKey', Integer, primary_key=True)
    name = Column('SchoolName', String)
    district = Column('SchoolDistrict', String)


class StudentScore(Base):
    __tablename__ = 'FactStudentScore'

    SchoolKey = Column('SchoolKey', Integer, ForeignKey('DimSchool.SchoolKey'), primary_key = True)
    PointsPossible = Column('PointsPossible', Integer)
    PointsReceived = Column('PointsReceived', Integer)

    school = relationship("School", backref='studentscore')

So when I query something like:

query = session.query(StudentScore, School).join(School)
df = pd.read_sql(query.statement, query.session.bind)

I end up with the underlying 'SchoolDistrict' name for the column, not my attribute name, in the returned DataFrame df.

EDIT: An even more annoying case is when duplicate column names exist across tables. For example:

class Teacher(Base):
    __tablename__ = 'DimTeacher'

    id = Column('TeacherKey', Integer, primary_key=True)
    fname = Column('FirstName', String)
    lname = Column('FirstName', String)

class Student(Base):
    __tablename__ = 'DimStudent'

    id = Column('StudentKey', Integer, primary_key=True)
    fname = Column('FirstName', String)
    lname = Column('FirstName', String)

So a query across both tables (like the one below) produces a dataframe with duplicate FirstName and LastName columns.

query = session.query(StudentScore, Student, Teacher).join(Student).join(Teacher)

Would it be possible to rename these columns at the moment of the query? Right now I'm having trouble keeping my head straight with these two systems of column names.

like image 695
AZhao Avatar asked Jun 30 '15 20:06

AZhao


People also ask

How do you name columns in Pandas DataFrame?

One way of renaming the columns in a Pandas Dataframe is by using the rename() function.

How do I fix column names in Pandas?

One way to rename columns in Pandas is to use df. columns from Pandas and assign new names directly. For example, if you have the names of columns in a list, you can assign the list to column names directly. This will assign the names in the list as column names for the data frame “gapminder”.

How do I get column names in SQLAlchemy?

Method 1: Using keys() Here we will use key() methods to get the get column names. It returns an iterable view which yields the string keys that would be represented by each Row.


1 Answers

I am not a SQLAlchemy expert by any means, but I have come up with a more generalized solution (or at least a start).

Caveats

  • Will not handle mapped columns with the same name across different Models. You should deal with this by adding suffix, or you could modify my answer below to create pandas columns as <tablename/model name>.<mapper column name>.

It involves four key steps:

  1. Qualify your query statement with labels, which will result in column names in pandas of <table name>_<column name>:
df = pd.read_sql(query.statement, query.session.bind).with_labels()
  1. Separate table name from (actual) column name
table_name, col = col_name.split('_', 1)
  1. Get the Model based on tablename (from this question's answers)
for c in Base._decl_class_registry.values():
            if hasattr(c, '__tablename__') and c.__tablename__ == tname:
                return c
  1. Find the correct mapped name
for k, v in sa_class.__mapper__.columns.items():
        if v.name == col:
            return k

Bringing it all together, this is the solution I have come up with, with the main caveat being it will result in duplicate column names in your dataframe if you (likely) have duplicate mapped names across classes.

from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class School(Base):
    __tablename__ = 'DimSchool'

    id = Column('SchoolKey', Integer, primary_key=True)
    name = Column('SchoolName', String)
    district = Column('SchoolDistrict', String)


class StudentScore(Base):
    __tablename__ = 'FactStudentScore'

    SchoolKey = Column('SchoolKey', Integer, ForeignKey('DimSchool.SchoolKey'), primary_key = True)
    PointsPossible = Column('PointsPossible', Integer)
    PointsReceived = Column('PointsReceived', Integer)

    school = relationship("School", backref='studentscore')


def mapped_col_name(col_name):
    ''' Retrieves mapped Model based on
    actual table name (as given in pandas.read_sql)
    '''

    def sa_class(table_name):
        for c in Base._decl_class_registry.values():
            if hasattr(c, '__tablename__') and c.__tablename__ == tname:
                return c

    table_name, col = col_name.split('_', 1)
    sa_class = sa_class(table_name)

    for k, v in sa_class.__mapper__.columns.items():
        if v.name == col:
            return k

query = session.query(StudentScore, School).join(School)
df = pd.read_sql(query.statement, query.session.bind).with_labels()
df.columns = map(mapped_col_name, df.columns)
like image 188
telemark Avatar answered Oct 06 '22 15:10

telemark