I have written the following random-number generator shell script:
for i in $(seq 1 $1) #for as many times, as the first argument ($1) defines...
do
echo "$i $((RANDOM%$2))" #print the current iteration number and a random number in [0, $2)
done
I run it like that:
./generator.sh 1000000000 101 > data.txt
to generate 1B rows of an id and a random number in [0,100] and store this data in file data.txt
.
My desired output is:
1 39
2 95
3 61
4 27
5 85
6 44
7 49
8 75
9 52
10 66
...
It works fine for small number of rows, but with 1B, I get the following OOM error:
./generator.sh: xrealloc: ../bash/subst.c:5179: cannot allocate 18446744071562067968 bytes (4299137024 bytes allocated)
Which part of my program creates the error?
How could I write the data.txt
file line-by-line?
I have tried replacing the echo
line with:
echo "$i $((RANDOM%$2))" >> $3
where $3 is data.txt
, but I see no difference.
The problem is your for
loop:
for i in $(seq 1 $1)
This will first expand $(seq 1 $1)
, creating a very big list, which you then pass to for
.
Using while
, however, we can read the output of seq
line-by-line, which will take a small amount of memory:
seq 1 1000000000 | while read i; do
echo $i
done
$(seq 1 $1)
is computing the whole list before iterating over it. So it takes memory to store the entire list of 10^9
numbers, which is a lot.
I am not sure if you can make seq
run lazily, i.e, get the next number only when needed. You can do a simple for loop instead:
for ((i=0; i<$1;++i))
do
echo "$i $((RANDOM%$2))"
done
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With