Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ApacheBeam/DataFlow runner with JdbcIO writer creates too many connections

I'm using GCP cloud SQL, with MySQL instance and using JdbcIo to write data from DataFlow pipeline to MySQL.

Seems like DataFlow generates too many connections, and reach the DB limit (4000), even that I specify in the connection pool, max size: 1000

 ComboPooledDataSource dataSource = new ComboPooledDataSource();
 try {
       dataSource.setDriverClass("org.mysql.Driver");
     } catch (PropertyVetoException e) {
            throw new RuntimeException("Failed set mysql driver",e);
     }
       dataSource.setJdbcUrl("jdbc:mysql://google/live-data?cloudSqlInstance<INSTANCE_NAME>&socketFactory=com.google.cloud.sql.mysql.SocketFactory&useSSL=false&user=<USER>&password=<PASSWORD>");

       dataSource.setMaxPoolSize(1000);
       dataSource.setInitialPoolSize(1000);

Also, in the dashboard I can see much more connections then queries: enter image description here

enter image description here

enter image description here

my pom.xml

 <dependency>
            <groupId>com.mchange</groupId>
            <artifactId>c3p0</artifactId>
            <version>0.9.5.4</version>
        </dependency>
        <dependency>
            <groupId>com.google.cloud.sql</groupId>
            <artifactId>mysql-socket-factory</artifactId>
            <version>1.0.13</version>
        </dependency>
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-java-io-jdbc</artifactId>
            <version>${beam.version}</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.47</version>
        </dependency>
like image 236
Brachi Avatar asked Nov 24 '25 19:11

Brachi


1 Answers

DataFlow generates too many connections, and reach the DB limit (4000), even that I specify in the connection pool, max size: 1000

A quick guess would be that since Dataflow can have multiple runners, each runner likely has its own connection pool. This means that each pool will have a 1000 separate connections. This is highly likely way more connections than you should actually be using - see HikariCP's wiki on pool sizing.

Also, in the dashboard I can see much more connections then queries:

Since you set setInitialPoolSize(1000) the pool doesn't lazily establish connections as needed, and instead it creates 1000 on initialization of the pool. From the sample provided, you haven't provided any limits on connection lifespans, so these connections will likely persist as long as possible.

like image 103
kurtisvg Avatar answered Nov 27 '25 08:11

kurtisvg