Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate data to test Snowflake abilities to handle tables with thousands of columns?

Can Snowflake handle thousands of columns?

Is there a way I can generate test data to test Snowflake's performance when handling, let's say, 2000 columns?

like image 887
Felipe Hoffa Avatar asked Sep 20 '25 08:09

Felipe Hoffa


1 Answers

With this script you can create a table with 2000 (or any number) of columns, giving each a default random value:

CREATE or replace PROCEDURE create_wide_table()
  RETURNS VARCHAR
  LANGUAGE javascript
  AS
  $$
ss = 'create or replace table wide2000 (id int';
for (const x of Array(2000).keys()) {
  ss += ', a' +x + ' int default uniform(1, 10000, random())';
}
ss += ');'
//return ss;
  var rs = snowflake.execute( { sqlText:  ss       } );
  return 'Done.';
  $$;

That will generate and execute a SQL query that looks like:

create or replace table wide2000 
  (id int
   , a0 int default uniform(1, 10000, random())
   , a1 int default uniform(1, 10000, random())
   , ...
);

You can modify the script to generate data that better resembles your type of data and sparsity.

What's cool about this table is that it has default random numbers for most columns. Therefore adding a million rows to it becomes really easy.

Create the table first:

call create_wide_table();

Generate a million rows:

insert into wide2000 (id) 
select seq8()
from table(generator(rowcount => 1000000)) 

Run your experiments:

select *
from wide2000
limit 10

enter image description here

For more complex data generation needs, check https://github.com/RobertFehrmann/fehrminator.

like image 122
Felipe Hoffa Avatar answered Sep 23 '25 07:09

Felipe Hoffa