I have a script that processes a number of small files from a slow mass memory.
For performance reasons I read the file to a variable, and all processing then happens using this variable. This allows me to read each file only once.
This works well, except when the last line is empty, then the variable will be one line shorter than the file, see simplified example below.
Is there a way to read empty lines in the end of the file to a variable?
$ rm -f /tmp/a ; for i in $(seq 3) ; do echo $i >> /tmp/a ; done
$ cat /tmp/a
1
2
3
$ wc -l /tmp/a
3 /tmp/a
$ a="$(cat /tmp/a)"
$ echo "$a"
1
2
3
$ echo "$a" | wc -l
3
$ rm -f /tmp/b ; for i in $(seq 3) ; do echo $i >> /tmp/b ; done
$ echo >> /tmp/b # ADD EXTRA EMPTY LINE TO THE END
$ cat /tmp/b
1
2
3
$ wc -l /tmp/b
4 /tmp/b
$ b="$(cat /tmp/b)"
$ echo "$b"
1
2
3
$ echo "$b" | wc -l
3
To look at the last few lines of a file, use the tail command. tail works the same way as head: type tail and the filename to see the last 10 lines of that file, or type tail -number filename to see the last number lines of the file.
We use the read command with -r argument to read the contents without escaping the backslash character. We read the content of each line and store that in the variable line and inside the while loop we echo with a formatted -e argument to use special characters like \n and print the contents of the line variable.
$(...)
strips all trailing newlines. From the bash man page:
Command substitution allows the output of a command to replace the command name. There are two forms:
$(command)
or
`command`
Bash performs the expansion by executing command and replacing the command substitution with the standard output of the command, with any trailing newlines deleted. Embedded newlines are not deleted, but they may be removed during word splitting. The command substitution
$(cat file)
can be replaced by the equivalent but faster$(< file)
.
Use mapfile
to read the entire file in while preserving newlines. It reads each line into an array.
$ mapfile b < /tmp/b
$ printf '%s' "${b[@]}"
1
2
3
$ printf '%s' "${b[@]}" | wc -l
4
Avoid echo
, which adds an extra newline. printf '%s'
doesn't do that, so you're getting exactly what's in the array.
If don't want an array you can use printf -v
to flatten it into a single string while preserving newlines.
$ mapfile b < /tmp/b
$ printf -v b '%s' "${b[@]}"
$ printf '%s' "$b"
1
2
3
$ printf '%s' "$b" | wc -l
4
For performance reasons I read the file to a variable, and all processing then happens using this variable. This allows me to read each file only once.
This may be premature optimization. Once a file is read from disk the OS will keep it in cache. Re-reading files that are still in cache is extremely fast.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With