Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Random numbers generation with awk in BASH shell

Tags:

awk

I wish to shuffle the lines (the rows) of a file at random then print out to different five files.

But I keep having exactly the same order of lines appeared in file1 to file5. The random generation process does not work properly. I would be grateful for any advices.

#!/bin/bash
for i in seq 1 5
do
  awk 'BEGIN{srand();}  {print rand()"\t"$0}' shuffling.txt  | sort -k2 -k1 -n | cut -f2-  > file$i.txt
done

Input shuffling.txt

111 1032192
111 2323476
111 1698881
111 2451712
111 2013780
111  888105
112 2331004
112 1886376
112 1189765
112 1877267
112 1772972
112  574631
like image 822
Tony Avatar asked Oct 29 '10 01:10

Tony


People also ask

How do you generate random numbers in awk?

Awk rand() Function rand() is used to generate the random number between 0 and 1. It never return 0 and 1. It always returns the value between 0 and 1. Numbers are random with in one awk run, but predictable from run to run.

How do you generate random numbers in bash?

The random number or a range of random numbers can be generated using the $RANDOM variable. It generates a random number between 0 and 32767 by default. But you can set the range of numbers for generating random numbers by dividing the value of $RANDOM with a specific value.

What does awk do in bash?

AWK is a programming language that is designed for processing text-based data, either in files or data streams, or using shell pipes. In other words you can combine awk with shell scripts or directly use at a shell prompt. This pages shows how to use awk in your bash shell scripts.


2 Answers

If you don't provide a seed to srand, it will either use the current date and time or a fixed starting seed (this may vary with the implementation). That means, for the former, if your processes run fast enough, they'll all use the same seed and generate the same sequence.

And, for the latter, it won't matter how long you wait, you'll get the same sequence each time you run.

You can get around either of these by using a different seed, provided by the shell.

awk -v seed=$RANDOM 'BEGIN{srand(seed);}{print rand()" "$0}' ...

The number provided by $RANDOM changes in each iteration so each run of the awk program gets a different seed.

You can see this in action in the following transcript:

pax> for i in $(seq 1 5) ; do
...> awk 'BEGIN{srand();print rand()}'
...> done
0.0435039
0.0435039
0.0435039
0.0435039
0.0435039

pax> for i in $(seq 1 5) ; do
...> awk -v seed=$RANDOM 'BEGIN{srand(seed);print rand()}'
...> done
0.283898
0.0895895
0.841535
0.249817
0.398753
like image 187
paxdiablo Avatar answered Feb 12 '23 16:02

paxdiablo


#!/bin/bash
for i in {1..5}
do
    shuf -o "file$i.txt" shuffling.txt
done
like image 36
Dennis Williamson Avatar answered Feb 12 '23 15:02

Dennis Williamson