Unix

Question

I have a file with ~1000 lines that looks like this:

ABC C5A 1
CFD D5G 4
E1E FDF 3
CFF VBV 1
FGH F4R 2
K8K F9F 3
... etc

I would like to select 100 random lines, but with 10 of each third column value (so random 10 lines from all lines with value "1" in column 3, random 10 lines from all lines with value "2" in column 3, etc).

Is this possible using bash?

dogbane · Accepted Answer

First grep all the files with a certain number, shuffle them and pick the first 10 using shuf -n 10.

for i in {1..10}; do
    grep " ${i}$" file | shuf -n 10
done > randomFile

If you don't have shuf, use sort -R to randomly sort them instead:

for i in {1..10}; do
    grep " ${i}$" file | sort -R | head -10
done > randomFile

user000001 · Answer

If you can use awk, you can do the same with a one-liner

sort -R file | awk '{if (count[$3] < 10) {count[$3]++; print $0}}'

Unix - randomly select lines based on column values

Tags:

bash

random

Abdel

2 Answers

dogbane

user000001

Recent Activity

Donate For Us