generate large csv with random content in bash

Question

I am trying to generate a large csv with random content in bash. My machine has 6 cores and 12G ram but my script (see below) takes 140 seconds for only 10k lines with 3 columns. Is there any way to optimize this script?

Are there considerably faster ways of generating random csv files in other languages?

#!/bin/bash

csv="foo	bar	baz"
start=$(date)
for i in `seq 1 $1`
  do rand=$(($i * $RANDOM))
  str0="$$"$i
  str1=$( echo "$str0" | md5sum )
  randstring1="${str1:2:8}"
  randstring2="${str1:0:2}"
  csv="$csv
$randstring1	$randstring2	$rand"
done
end=$(date)
datediff=$(( $(date -d "$end" +%s) - $(date -d "$start" +%s)))
echo -e $csv > my_csv.csv
echo "script took $datediff seconds for $(wc -l my_csv.csv) lines"

Wintermute · Accepted Answer

To replace this script fairly precisely (format-wise), you could use

hexdump -v -e '5/1 "%02x""
"' /dev/urandom |
  awk -v OFS='	' '
    NR == 1 { print "foo", "bar", "baz" }
    { print substr($0, 1, 8), substr($0, 9, 2), int(NR * 32768 * rand()) }' |
  head -n "$1" > my_csv.csv

This falls into three parts:

hexdump -v -e '5/1 "%02x""
"' /dev/urandom

extracts from /dev/urandom sequences of five bytes and formats then as hexadecimal strings,

awk -v OFS='	' '
    NR == 1 { print "foo", "bar", "baz" }
    { print substr($0, 1, 8), substr($0, 9, 2), int(NR * 32768 * rand()) }'

formats the lines appropriately while adding a field that is the equivalent of $(($i * $RANDOM)) and a header line, and

head -n "$1"

takes the first $1 lines of this. When head quits, the pipe to awk is closed, awk quits, the pipe to hexdump is closed, and hexdump quits, so this makes the whole thing end at the right time.

On my machine (a Haswell i5), running this takes 0.83 seconds for a million lines.

generate large csv with random content in bash

Tags:

bash

unix

csv

jvdh

1 Answers

Wintermute

Recent Activity

Donate For Us

generate large csv with random content in bash

Tags:

bash

unix

csv

jvdh

1 Answers

Wintermute

Related questions

Recent Activity

Donate For Us