Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can you export a Postgres table into multiple CSV files based on a column value?

Tags:

csv

postgresql

I have a table in Postgres with ~6M rows and want to split and export them as multiple CSV files. Is there a way to automatically generate a different CSV file based on the value of a row's column?

like image 391
Paul Avatar asked Sep 08 '14 18:09

Paul


2 Answers

In general, you want COPY (SELECT ...).

One option is to use PL/PgSQL and EXECUTE. Something like:

DO
LANGUAGE plpgsql
$$
DECLARE
    colval integer;
BEGIN
    FOR colval IN SELECT DISTINCT thecol FROM thetable
    LOOP
       EXECUTE format('COPY (SELECT * FROM thetable WHERE colval = %L) TO ''/tmp/out-%s.csv'';', colval, colval);
    END LOOP;
END;
$$

Another is to write a script using psql and \copy.

Another would be to use your preferred client language and its support for COPY, e.g. PgJDBC's CopyManager, Python+psycopg2's copy_to, etc.

Update: I just realised it's simpler than this. ORDER BY the target column and split the file stream as you process it. Example with psql, bash and awk:

CREATE TABLE demo(
  id serial primary key,
  targetcol integer not null
);

-- Create 10 distinct values for targetcol with 100 entries each
insert into demo(targetcol)
select x
from generate_series(1,10) x cross join generate_series(1,100) y;

then take column $2 to be part of the filename, switching files to output records:

psql -At -c '\copy (SELECT * FROM demo ORDER BY targetcol) TO stdout' | \
awk '
  BEGIN {
    prev_col=0;
    cur_file="";
  } 
  {
    if ($2 != prev_col) {
      prev_col = $2;
      if (cur_file != "") {
        close(cur_file);
      }
      cur_file = sprintf("outfile-%d",$2);
      printf "" > cur_file;
    } 
    print $0 >> cur_file; 
  }
';

In fact, this doesn't even require sorted input if you don't mind it being a bit slower and possibly running out of max open files if there are lots of values of the target column:

psql -At -c '\copy demo TO stdout' | \
awk '
  BEGIN {
    cur_file="";
  } 
  {
    print $0 >> sprintf("outfile-%d",$2); 
  }
';
like image 87
Craig Ringer Avatar answered Sep 29 '22 01:09

Craig Ringer


There are certainly a few ways to do this. I can't think of a way to do it in a single command automatically. I don't know what your operating system is, or if you might want to do this in a stored procedure, or ? If I was going to do this, quick and dirty, from the command line, I'd:

$ # bash shell here.
$ for i in `psql -Upostgres -h HOSTIP -Atq DBNAME -c 'select distinct COLNAME from TABLENAME'`; do
$   echo 'working on ': $i
$   cmd="select * from TABLENAME where COLNAME = '$i'"
$   psql -Upostgres -h HOSTIP -Atq DBNAME -c "copy ( $cmd ) to stdout with delimiter ','" > /tmp/$i
$ done

You will need to supply: HOSTIP (or omit the -h HOSTIP if you connect correctly by default) DBNAME the database with the data in it TABLENAME the name of the table with 6MM rows COLNAME the name of the column which dictates the name of file to copy data to

The result is a bunch of files in the /tmp directory with comma separated pieces of the table contents.

This should give you some ideas. I guess the answer to your question is no, there is no 'automatic' way. Good Luck!

-g

like image 39
Greg Avatar answered Sep 29 '22 02:09

Greg