Taking N-samples from each group in PostgreSQL

Question

I have a table containing data that has a column named id that looks like below:

id	value 1	value 2	value 3
1	244	550	1000
1	251	551	700
1	540	60	1200
...	...	...	...
2	19	744	2000
2	10	903	100
2	44	231	600
2	120	910	1100
...	...	...	...

I want to take 50 sample rows per id that exists but if less than 50 exist for the group to simply take the entire set of data points.

For example I would like a maximum 50 data points randomly selected from id = 1, id = 2 etc...

I cannot find any previous questions similar to this but have tried taking a stab at at least logically working through the solution where I could iterate and union all queries by id and limit to 50:

SELECT * FROM (SELECT * FROM schema.table AS tbl WHERE tbl.id = X LIMIT 50) UNION ALL;

But it's obvious that you cannot use this type of solution because UNION ALL requires aggregating outputs from one id to the next and I do not have a list of id values to use in place of X in tbl.id = X.

Is there a way to accomplish this by gathering that list of unique id values and union all results or is there a more optimal way this could be done?

Pierre D · Accepted Answer

If you want to select a random sample for each id, then you need to randomize the rows somehow. Here is a way to do it:

select * from (
  select *, row_number() over (partition by id order by random()) as u
  from schema.table
) as a
where u <= 50;

Example (limiting to 3, and some row number for each id so you can see the selection randomness):

setup

DROP TABLE IF EXISTS foo;

CREATE TABLE foo
(
  id int,
  value1 int,
  idrow int
  );

INSERT INTO foo
select 1 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow
union all
select 2 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow
union all
select 3 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow;

Selection

select * from (
  select *, row_number() over (partition by id order by random()) as u
  from foo
) as a
where u <= 3;

Output:

id	value1	idrow	u
1	542	6	1
1	24	86	2
1	155	74	3
2	505	95	1
2	100	46	2
2	422	33	3
3	966	88	1
3	747	89	2
3	664	19	3

Taking N-samples from each group in PostgreSQL

Tags:

postgresql

grouping

sampling

HelloToEarth

1 Answers

Pierre D

Recent Activity

Donate For Us

id	value1	idrow	u
1	542	6	1
1	24	86	2
1	155	74	3
2	505	95	1
2	100	46	2
2	422	33	3
3	966	88	1
3	747	89	2
3	664	19	3

id	value1	idrow	u
1	542	6	1
1	24	86	2
1	155	74	3
2	505	95	1
2	100	46	2
2	422	33	3
3	966	88	1
3	747	89	2
3	664	19	3

Taking N-samples from each group in PostgreSQL

Tags:

postgresql

grouping

sampling

HelloToEarth

1 Answers

Pierre D

Related questions

Recent Activity

Donate For Us

id	value1	idrow	u
1	542	6	1
1	24	86	2
1	155	74	3
2	505	95	1
2	100	46	2
2	422	33	3
3	966	88	1
3	747	89	2
3	664	19	3