Sampling without replacement using awk

Question

I have a lot of text files that look like this:

>ALGKAHOLAGGATACCATAGATGGCACGCCCT
>BLGKAHOLAGGATACCATAGATGGCACGCCCT
>HLGKAHOLAGGATACCATAGATGGCACGCCCT
>DLGKAHOLAGGATACCATAGATGGCACGCCCT
>ELGKAHOLAGGATACCATAGATGGCACGCCCT
>FLGKAHOLAGGATACCATAGATGGCACGCCCT
>JGGKAHOLAGGATACCATAGATGGCACGCCCT
>POGKAHOLAGGATACCATAGATGGCACGCCCT

Is there a way to do a sampling without replacement using awk?

For example, I have this 8 lines, and I only want to sample 4 of these randomly in a new file, without replacement. The output should look something like this:

>FLGKAHOLAGGATACCATAGATGGCACGCCCT
>POGKAHOLAGGATACCATAGATGGCACGCCCT    
>ALGKAHOLAGGATACCATAGATGGCACGCCCT
>BLGKAHOLAGGATACCATAGATGGCACGCCCT

Thanks in advance

Mark Setchell · Accepted Answer

How about this for a random sampling of 10% of your lines?

awk 'rand()>0.9' yourfile1 yourfile2 anotherfile

I am not sure what you mean by "replacement"... there is no replacement occurring here, just random selection.

Basically, it looks at each line of each file precisely once and generates a random number on the interval 0 to 1. If the random number is greater than 0.9, the line is output. So basically it is rolling a 10 sided dice for each line and only printing it if the dice comes up as 10. No chance of a line being printed twice - unless it occurs twice in your files, of course.

For added randomness (!) you can add an srand() at the start as suggested by @klashxx

awk 'BEGIN{srand()} rand()>0.9' yourfile(s)

kojiro · Answer

Yes, but I wouldn't. I would use shuf or sort -R (neither POSIX) to randomize the file and then select the first n lines using head.

If you really want to use awk for this, you would need to use the rand function, as Mark Setchell points out.

Sampling without replacement using awk

Tags:

bash

shell

awk

JM88

2 Answers

Mark Setchell

kojiro

Recent Activity

Donate For Us

Sampling without replacement using awk

Tags:

bash

shell

awk

JM88

2 Answers

Mark Setchell

kojiro

Related questions

Recent Activity

Donate For Us