I have a table in Postgres with ~6M rows and want to split and export them as multiple CSV files. Is there a way to automatically generate a different CSV file based on the value of a row's column?
In general, you want COPY (SELECT ...)
.
One option is to use PL/PgSQL and EXECUTE
. Something like:
DO
LANGUAGE plpgsql
$$
DECLARE
colval integer;
BEGIN
FOR colval IN SELECT DISTINCT thecol FROM thetable
LOOP
EXECUTE format('COPY (SELECT * FROM thetable WHERE colval = %L) TO ''/tmp/out-%s.csv'';', colval, colval);
END LOOP;
END;
$$
Another is to write a script using psql
and \copy
.
Another would be to use your preferred client language and its support for COPY
, e.g. PgJDBC's CopyManager, Python+psycopg2's copy_to
, etc.
Update: I just realised it's simpler than this. ORDER BY
the target column and split the file stream as you process it. Example with psql
, bash
and awk
:
CREATE TABLE demo(
id serial primary key,
targetcol integer not null
);
-- Create 10 distinct values for targetcol with 100 entries each
insert into demo(targetcol)
select x
from generate_series(1,10) x cross join generate_series(1,100) y;
then take column $2 to be part of the filename, switching files to output records:
psql -At -c '\copy (SELECT * FROM demo ORDER BY targetcol) TO stdout' | \
awk '
BEGIN {
prev_col=0;
cur_file="";
}
{
if ($2 != prev_col) {
prev_col = $2;
if (cur_file != "") {
close(cur_file);
}
cur_file = sprintf("outfile-%d",$2);
printf "" > cur_file;
}
print $0 >> cur_file;
}
';
In fact, this doesn't even require sorted input if you don't mind it being a bit slower and possibly running out of max open files if there are lots of values of the target column:
psql -At -c '\copy demo TO stdout' | \
awk '
BEGIN {
cur_file="";
}
{
print $0 >> sprintf("outfile-%d",$2);
}
';
There are certainly a few ways to do this. I can't think of a way to do it in a single command automatically. I don't know what your operating system is, or if you might want to do this in a stored procedure, or ? If I was going to do this, quick and dirty, from the command line, I'd:
$ # bash shell here.
$ for i in `psql -Upostgres -h HOSTIP -Atq DBNAME -c 'select distinct COLNAME from TABLENAME'`; do
$ echo 'working on ': $i
$ cmd="select * from TABLENAME where COLNAME = '$i'"
$ psql -Upostgres -h HOSTIP -Atq DBNAME -c "copy ( $cmd ) to stdout with delimiter ','" > /tmp/$i
$ done
You will need to supply: HOSTIP (or omit the -h HOSTIP if you connect correctly by default) DBNAME the database with the data in it TABLENAME the name of the table with 6MM rows COLNAME the name of the column which dictates the name of file to copy data to
The result is a bunch of files in the /tmp directory with comma separated pieces of the table contents.
This should give you some ideas. I guess the answer to your question is no, there is no 'automatic' way. Good Luck!
-g
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With