Postgresql partition and sqlalchemy

Tags:

SQLAlchemy doc explain how to create a partitioned table. But it does not explains how to create partitions.

So if I have this :

#Skipping create_engine and metadata
Base = declarative_base()

class Measure(Base):
    __tablename__ = 'measures'
    __table_args__ = {
        postgresql_partition_by: 'RANGE (log_date)'
    }
    city_id = Column(Integer, not_null=True)
    log_date = Columne(Date, not_null=True)
    peaktemp = Column(Integer)
    unitsales = Column(Integer)

class Measure2020(Base):
    """How am I suppposed to declare this ? """

I know that most of the I'll be doing SELECT * FROM measures WHERE logdate between XX and YY. But that seems interesting.

702

asked May 01 '20 15:05

Rémi Desgrange

2 Answers

Maybe a bit late, but I would like to share what I built upon @moshevi 's and @Seb 's answers:

In my IoT use-case, I required actual sub-partitioning (first level year, second level nodeid). Also I wanted to generalize it slightly.

This is what I came up with:

from sqlalchemy.ext.declarative import DeclarativeMeta
from sqlalchemy.sql.ddl import DDL
from sqlalchemy import event

class PartitionByMeta(DeclarativeMeta):
    def __new__(cls, clsname, bases, attrs, *, partition_by, partition_type):

        @classmethod
        def get_partition_name(cls_, suffix):
            return f'{cls_.__tablename__}_{suffix}'

        @classmethod
        def create_partition(cls_, suffix, partition_stmt, subpartition_by=None, subpartition_type=None):
            if suffix not in cls_.partitions:

                partition = PartitionByMeta(
                    f'{clsname}{suffix}',
                    bases,
                    {'__tablename__': cls_.get_partition_name(suffix)},
                    partition_type = subpartition_type,
                    partition_by=subpartition_by,
                )

                partition.__table__.add_is_dependent_on(cls_.__table__)

                event.listen(
                    partition.__table__,
                    'after_create',
                    DDL(
                        # For non-year ranges, modify the FROM and TO below
                        # LIST: IN ('first', 'second');
                        # RANGE: FROM ('{key}-01-01') TO ('{key+1}-01-01')
                        f"""
                        ALTER TABLE {cls_.__tablename__}
                        ATTACH PARTITION {partition.__tablename__}
                        {partition_stmt};
                        """
                    )
                )
                
                cls_.partitions[suffix] = partition
            
            return cls_.partitions[suffix]
        
        if partition_by is not None:
            attrs.update(
                {
                    '__table_args__': attrs.get('__table_args__', ())
                    + (dict(postgresql_partition_by=f'{partition_type.upper()}({partition_by})'),),
                    'partitions': {},
                    'partitioned_by': partition_by,
                    'get_partition_name': get_partition_name,
                    'create_partition': create_partition
                }
            )
        
        return super().__new__(cls, clsname, bases, attrs)

Which is to be used as follows, assuming the respective VehicleDataMixin class to be created as introduced by @moshevi

class VehicleData(VehicleDataMixin, Project, metaclass=PartitionByMeta, partition_by='timestamp',partition_type='RANGE'):
    __tablename__ = 'vehicle_data'
    __table_args__ = (
        Index('ts_ch_nod_idx', "timestamp", "nodeid", "channelid", postgresql_using='brin'),
        UniqueConstraint('timestamp','nodeid','channelid', name='ts_ch_nod_constr')
    )

Which can then be subpartitoned iteratively like so (to be adapted)

    for y in range(2017, 2021): 
         # Creating tables for all known nodeids
        tbl_vehid_y = VehicleData.create_partition(
            f"{y}", partition_stmt=f"""FOR VALUES FROM ('{y}-01-01') TO ('{y+1}-01-01')""",
            subpartition_by='nodeid', subpartition_type='LIST'
        )
        
        for i in {3, 4, 7, 9}:
            # Creating all the years below these nodeids including a default partition
            tbl_vehid_y.create_partition(
                f"nid{i}", partition_stmt=f"""FOR VALUES IN ('{i}')"""
            )
        
        # Defaults (nodeid) per year partition
        tbl_vehid_y.create_partition("def", partition_stmt="DEFAULT")

   # Default to any other year than anticipated
   VehicleData.create_partition("def", partition_stmt="DEFAULT")

partition_by='timestamp' <= This is the column to partition by

partition_type='RANGE' <= This is the (PSQL specific) partition type

partition_stmt=f"""FOR VALUES IN ('{i}')""" <= This is the (PSQL specific) partitioning statement.

answered Sep 19 '22 16:09

jenszo

I had a similar problem. I found @moshevi's answer quite useful, and ended up generalising it a bit (as I had many tables to partition).

First, create a metaclass such as this:

from sqlalchemy.ext.declarative import DeclarativeMeta
from sqlalchemy.sql.ddl import DDL
from sqlalchemy import event


class PartitionByYearMeta(DeclarativeMeta):
    def __new__(cls, clsname, bases, attrs, *, partition_by):
        @classmethod
        def get_partition_name(cls_, key):
            # 'measures' -> 'measures_2020' (customise as needed)
            return f'{cls_.__tablename__}_{key}'
        
        @classmethod
        def create_partition(cls_, key):
            if key not in cls_.partitions:
                
                Partition = type(
                    f'{clsname}{key}', # Class name, only used internally
                    bases,
                    {'__tablename__': cls_.get_partition_name(key)}
                )
                
                Partition.__table__.add_is_dependent_on(cls_.__table__)
                
                event.listen(
                    Partition.__table__,
                    'after_create',
                    DDL(
                        # For non-year ranges, modify the FROM and TO below
                        f"""
                        ALTER TABLE {cls_.__tablename__}
                        ATTACH PARTITION {Partition.__tablename__}
                        FOR VALUES FROM ('{key}-01-01') TO ('{key+1}-01-01');
                        """
                    )
                )
                
                cls_.partitions[key] = Partition
            
            return cls_.partitions[key]
        
        attrs.update(
            {
                # For non-RANGE partitions, modify the `postgresql_partition_by` key below
                '__table_args__': attrs.get('__table_args__', ())
                + (dict(postgresql_partition_by=f'RANGE({partition_by})'),),
                'partitions': {},
                'partitioned_by': partition_by,
                'get_partition_name': get_partition_name,
                'create_partition': create_partition
            }
        )
        
        return super().__new__(cls, clsname, bases, attrs)

Next, for any table in your model that you want to partition:

class MeasureMixin:
    # The columns need to be pulled out into this mixin
    # Note: any foreign key columns will need to be wrapped like this:

    @declared_attr
    def city_id(self):
        return Column(ForeignKey('cities.id'), not_null=True)
    
    log_date = Column(Date, not_null=True)
    peaktemp = Column(Integer)
    unitsales = Column(Integer)

class Measure(MeasureMixin, Base, metaclass=PartitionByYearMeta, partition_by='logdate'):
    __tablename__ = 'measures'

This makes it easy to add more tables and partition by any number of values.

Creating a new partition on the fly works like this:

# Make sure you commit any session that is currently open, even for select queries:
session.commit()

Partition = Measure.create_partition(2020)
if not engine.dialect.has_table(Partition.__table__.name):
    Partition.__table__.create(bind=engine)

Now the partition for key 2020 is created and values for that year can be inserted.

answered Sep 17 '22 16:09

Seb

Related questions
                            
                                Why can't I append pandas dataframe in a loop
                            
                                Forex historical data in Python
                            
                                yaml.dump adding unwanted newlines in multiline strings
                            
                                How to skip header and footer data in pandas dataframe?
                            
                                Change first element of each group in pandas DataFrame
                            
                                Trouble setting environment variables for CTest tests
                            
                                Custom weight initialization tensorflow tf.layers.dense
                            
                                DataFrame calculating by group for log return of each stock
                            
                                A way to quick preview .ipynb files
                            
                                Difference between "as_index = False", and "reset_index()" in pandas groupby
                            
                                arrow in plot matplotlib.pyplot
                            
                                SQLAlchemy update multiple rows in one transaction
                            
                                Python 3 pandas.groupby.filter
                            
                                How can I change the image size of a Plotly saved image?
                            
                                python3 dataclass with **kwargs(asterisk)
                            
                                Numpy in-place operation performance
                            
                                How to improve network graph visualization? [closed]
                            
                                What is the correct way in python to annotate a path with type hints? [duplicate]
                            
                                pandas overwrite values in multiple columns at once based on condition of values in one column
                            
                                Can you have an async handler in Lambda Python 3.6?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Postgresql partition and sqlalchemy

Tags:

python

postgresql

sqlalchemy

postgresql-12

Rémi Desgrange

People also ask

2 Answers

jenszo

Seb

Recent Activity

Donate For Us