Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Golang: select few million rows from PostgreSQL database using readable streams like in node.js

I have PostgreSQL table with ~ 50 million rows, I want to write Go code to select ~ 1 million rows from this table, and process them in efficient way.

Previous time i used nodejs and this NPM module pg-query-stream to generate readable stream of records found, so i can process them like any readable object stream.

Here I post simplified code I used to process data:


const pg = require('pg');
const QueryStream = require('pg-query-stream');

 
//pipe 1,000,000 rows to stdout without blowing up your memory usage
pg.connect((err, client, done) => {
  if (err) throw err;
  const query = new QueryStream('SELECT * FROM generate_series(0, $1) num', [1000000]);
  const stream = client.query(query);
  //release the client when the stream is finished
  stream.on('end', done);
  stream.on('data', function(data) { 
    stream.pause();
    funcDoSomethingWithDataAsync(data, function(error) {
      if(error) throw error;
      stream.resume();
    });
 };
})

How can I emulate readable stream of database records in Go? Does sql.Scanner in Go works with streaming query results like nodejs module does?

I already have optimized queries that works ok, I just want to stream query execution result to Go, like its done in nodejs library.

like image 801
vodolaz095 Avatar asked Jun 05 '26 13:06

vodolaz095


2 Answers

Yes, it works very much the same: execute the query, iterate through the results. Here's a simple example using lib/pq which is the Postgres version of database/sql.

Make the Query and then iterate through the Rows.

rows, err := db.Query(`SELECT * FROM generate_series(0, $1) num`, 1000000)
if err != nil {
   panic(err)
}
 
defer rows.Close()
for rows.Next() {
    var num int
 
    err = rows.Scan(&num)
    if err != nil {
       panic(err)
    }
 
    fmt.Println(num)
}
like image 171
Schwern Avatar answered Jun 07 '26 11:06

Schwern


I observed memory usage of Go program which uses lib/pq:

  • When the result set returned is 0 or 1 row, memory used is around 15MB
  • When result set has 50K rows (each of 200bytes, amount to 10MB). I put sleep while looping through results but memory increase was only 2 MB. Even with multiple such heavy requests memory rose by few MBs

So Schwern is correct that pq maintains buffer for reading results.

like image 40
djoker Avatar answered Jun 07 '26 13:06

djoker



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!