Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I generate big data sample for Postgresql using generate_series and random?

Tags:

postgresql

I want to generate big data sample (almost 1 million records) for studying tuplesort.c's polyphase merge in postgresql, and I hope the schema as follows:

CREATE TABLE Departments (code VARCHAR(4), UNIQUE (code)); CREATE TABLE Towns (   id SERIAL UNIQUE NOT NULL,   code VARCHAR(10) NOT NULL, -- not unique   article TEXT,   name TEXT NOT NULL, -- not unique   department VARCHAR(4) NOT NULL REFERENCES Departments (code),   UNIQUE (code, department) ); 

how to use generate_series and random for do it? thanks a lot!

like image 245
abelard2008 Avatar asked Jul 19 '14 13:07

abelard2008


People also ask

Can Postgres be used for big data?

PostgreSQL is well known as the most advanced opensource database, and it helps you to manage your data no matter how big, small or different the dataset is, so you can use it to manage or analyze your big data, and of course, there are several ways to make this possible, e.g Apache Spark.

Can Postgres handle billions of rows?

As commercial database vendors are bragging about their capabilities we decided to push PostgreSQL to the next level and exceed 1 billion rows per second to show what we can do with Open Source. To those who need even more: 1 billion rows is by far not the limit - a lot more is possible. Watch and see how we did it.


1 Answers

To insert one million rows into Towns

insert into towns (     code, article, name, department ) select     left(md5(i::text), 10),     md5(random()::text),     md5(random()::text),     left(md5(random()::text), 4) from generate_series(1, 1000000) s(i) 

Since id is a serial it is not necessary to include it.

like image 125
Clodoaldo Neto Avatar answered Oct 02 '22 16:10

Clodoaldo Neto