I have written the following random-number generator shell script:
for i in $(seq 1 $1) #for as many times, as the first argument ($1) defines...
do 
echo "$i $((RANDOM%$2))" #print the current iteration number and a random number in [0, $2)
done
I run it like that:
./generator.sh 1000000000 101 > data.txt
to generate 1B rows of an id and a random number in [0,100] and store this data in file data.txt.
My desired output is:
1 39
2 95
3 61
4 27
5 85
6 44
7 49
8 75
9 52
10 66
...
It works fine for small number of rows, but with 1B, I get the following OOM error:
./generator.sh: xrealloc: ../bash/subst.c:5179: cannot allocate 18446744071562067968 bytes (4299137024 bytes allocated)
Which part of my program creates the error? 
How could I write the data.txt file line-by-line?
I have tried replacing the echo line with:
echo "$i $((RANDOM%$2))" >> $3
where $3 is data.txt, but I see no difference.
The problem is your for loop:
for i in $(seq 1 $1) 
This will first expand $(seq 1 $1), creating a very big list, which you then pass to for.
Using while, however, we can read the output of seq line-by-line, which will take a small amount of memory:
seq 1 1000000000 | while read i; do
        echo $i
done
                        $(seq 1 $1) is computing the whole list before iterating over it. So it takes memory to store the entire list of 10^9 numbers, which is a lot. 
I am not sure if you can make seq run lazily, i.e, get the next number only when needed. You can do a simple for loop instead:
for ((i=0; i<$1;++i))
do
  echo "$i $((RANDOM%$2))"
done
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With