I'm passingly familiar with the dd command, but I've rarely had the need to use it myself. Today I need to, but I'm encountering behavior that seems really weird.
I want to create a 100M text file, each line of which contains the single word "testing." This was my first try:
~$ perl -e 'print "testing\n" while 1' | dd of=X bs=1M count=100
0+100 records in
0+100 records out
561152 bytes (561 kB) copied, 0.00416429 s, 135 MB/s
Hmm, that's odd. What about other combinations?
~$ perl -e 'print "testing\n" while 1' | dd of=X bs=100K count=1K
0+1024 records in
0+1024 records out
4268032 bytes (4.3 MB) copied, 0.0353145 s, 121 MB/s
~$ perl -e 'print "testing\n" while 1' | dd of=X bs=10K count=10K
86+10154 records in
86+10154 records out
42524672 bytes (43 MB) copied, 0.35403 s, 120 MB/s
~$ perl -e 'print "testing\n" while 1' | dd of=X bs=1K count=100K
102400+0 records in
102400+0 records out
104857600 bytes (105 MB) copied, 0.879549 s, 119 MB/s
So of these four apparently-equivalent commands, all produce files of different sizes, only one of which is the one I would expect. Why is that?
EDIT: By the by, I'm a little embarrassed I didn't think of "yes testing" instead of that longer Perl command.
dd command reads one block of input and process it and writes it into an output file. You can specify the block size for input and output file. In the above dd command example, the parameter “bs” specifies the block size for the both the input and output file. So dd uses 2048bytes as a block size in the above command.
' iflag= flag [, flag ]… ' Access the input file using the flags specified by the flag argument(s). (No spaces around any comma(s).) ' oflag= flag [, flag ]… ' Access the output file using the flags specified by the flag argument(s).
The dd seek option is similar to the UNIX lseek() system call1. It moves the read/write pointer within the file. From the man page: seek=n Skip n blocks (using the specified output block size) from the beginning of the output file before copying.
To see what's going on, let's look at the output of strace for a similar invocation:
execve("/bin/dd", ["dd", "of=X", "bs=1M", "count=2"], [/* 72 vars */]) = 0
…
read(0, "testing\ntesting\ntesting\ntesting\n"..., 1048576) = 69632
write(1, "testing\ntesting\ntesting\ntesting\n"..., 69632) = 69632
read(0, "testing\ntesting\ntesting\ntesting\n"..., 1048576) = 8192
write(1, "testing\ntesting\ntesting\ntesting\n"..., 8192) = 8192
close(0)                                = 0
close(1)                                = 0
write(2, "0+2 records in\n0+2 records out\n", 31) = 31
write(2, "77824 bytes (78 kB) copied", 26) = 26
write(2, ", 0.000505796 s, 154 MB/s\n", 26) = 26
…
What happens is that dd makes a single read() call to read each block. This is appropriate when reading from a tape, which is what dd was originally mainly used for. On tapes, read really reads a block. When reading from a file, you have to be careful not to specify a too large block size, or else the read will be truncated. When reading from a pipe, it's worse: the size of the block that you read will depend on the speed of the command producing the data.
The moral of the story is not to use dd to copy data, except with safe, small blocks. And never from a pipe except with bs=1.
(GNU dd has a fullblock flag to tell it to behave decently. But other implementations don't.)
I'm not yet sure why, but using this method will not fill up an entire block before saving it. Try:
perl -e 'print "testing\n" while 1' | dd of=output.txt bs=10K count=10K iflag=fullblock
10240+0 records in
10240+0 records out
104857600 bytes (105 MB) copied, 2.79572 s, 37.5 MB/s
The iflag=fullblock seems to force dd to accumulate input until the block is full, although I'm not sure why this is not the default, or what it actually does by default.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With