Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unix - randomly select lines based on column values

Tags:

bash

unix

random

I have a file with ~1000 lines that looks like this:

ABC C5A 1
CFD D5G 4
E1E FDF 3
CFF VBV 1
FGH F4R 2
K8K F9F 3
... etc

I would like to select 100 random lines, but with 10 of each third column value (so random 10 lines from all lines with value "1" in column 3, random 10 lines from all lines with value "2" in column 3, etc).

Is this possible using bash?

like image 766
Abdel Avatar asked Dec 26 '22 08:12

Abdel


2 Answers

First grep all the files with a certain number, shuffle them and pick the first 10 using shuf -n 10.

for i in {1..10}; do
    grep " ${i}$" file | shuf -n 10
done > randomFile

If you don't have shuf, use sort -R to randomly sort them instead:

for i in {1..10}; do
    grep " ${i}$" file | sort -R | head -10
done > randomFile
like image 52
dogbane Avatar answered Jan 07 '23 14:01

dogbane


If you can use awk, you can do the same with a one-liner

sort -R file | awk '{if (count[$3] < 10) {count[$3]++; print $0}}'
like image 43
user000001 Avatar answered Jan 07 '23 14:01

user000001