I want to generate big data sample (almost 1 million records) for studying tuplesort.c's polyphase merge in postgresql, and I hope the schema as follows:
CREATE TABLE Departments (code VARCHAR(4), UNIQUE (code)); CREATE TABLE Towns ( id SERIAL UNIQUE NOT NULL, code VARCHAR(10) NOT NULL, -- not unique article TEXT, name TEXT NOT NULL, -- not unique department VARCHAR(4) NOT NULL REFERENCES Departments (code), UNIQUE (code, department) );
how to use generate_series and random for do it? thanks a lot!
PostgreSQL is well known as the most advanced opensource database, and it helps you to manage your data no matter how big, small or different the dataset is, so you can use it to manage or analyze your big data, and of course, there are several ways to make this possible, e.g Apache Spark.
As commercial database vendors are bragging about their capabilities we decided to push PostgreSQL to the next level and exceed 1 billion rows per second to show what we can do with Open Source. To those who need even more: 1 billion rows is by far not the limit - a lot more is possible. Watch and see how we did it.
To insert one million rows into Towns
insert into towns ( code, article, name, department ) select left(md5(i::text), 10), md5(random()::text), md5(random()::text), left(md5(random()::text), 4) from generate_series(1, 1000000) s(i)
Since id
is a serial
it is not necessary to include it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With