I have a script that processes a number of small files from a slow mass memory. For performance reasons I read the file to a variable, and all processing then happens using this variable. This allows me to read each file only once. This works well, except when the last line is empty, then the variable will be one line shorter than the file, see simplified example below. Is there a way to read empty lines in the end of the file to a variable? <pre class="prettyprint"><code>$ rm -f /tmp/a ; for i in $(seq 3) ; do echo $i >> /tmp/a ; done $ cat /tmp/a 1 2 3 $ wc -l /tmp/a 3 /tmp/a $ a="$(cat /tmp/a)" $ echo "$a" 1 2 3 $ echo "$a" | wc -l 3 $ rm -f /tmp/b ; for i in $(seq 3) ; do echo $i >> /tmp/b ; done $ echo >> /tmp/b # ADD EXTRA EMPTY LINE TO THE END $ cat /tmp/b 1 2 3 $ wc -l /tmp/b 4 /tmp/b $ b="$(cat /tmp/b)" $ echo "$b" 1 2 3 $ echo "$b" | wc -l 3 </code></pre>

<code>$(...)</code> strips all trailing newlines. From the bash man page: <blockquote> Command substitution allows the output of a command to replace the command name. There are two forms: <pre class="prettyprint"><code>$(command) </code></pre> or <pre class="prettyprint"><code>`command` </code></pre> Bash performs the expansion by executing command and replacing the command substitution with the standard output of the command, with any trailing newlines deleted. Embedded newlines are not deleted, but they may be removed during word splitting. The command substitution <code>$(cat file)</code> can be replaced by the equivalent but faster <code>$(< file)</code>. </blockquote> Use <code>mapfile</code> to read the entire file in while preserving newlines. It reads each line into an array. <pre class="prettyprint"><code>$ mapfile b < /tmp/b $ printf '%s' "${b[@]}" 1 2 3 $ printf '%s' "${b[@]}" | wc -l 4 </code></pre> Avoid <code>echo</code>, which adds an extra newline. <code>printf '%s'</code> doesn't do that, so you're getting exactly what's in the array. If don't want an array you can use <code>printf -v</code> to flatten it into a single string while preserving newlines. <pre class="prettyprint"><code>$ mapfile b < /tmp/b $ printf -v b '%s' "${b[@]}" $ printf '%s' "$b" 1 2 3 $ printf '%s' "$b" | wc -l 4 </code></pre> <hr> <blockquote> For performance reasons I read the file to a variable, and all processing then happens using this variable. This allows me to read each file only once. </blockquote> This may be premature optimization. Once a file is read from disk the OS will keep it in cache. Re-reading files that are still in cache is extremely fast.

Bash: reading a short text file to a variable loses last empty line

Tags:

bash

I have a script that processes a number of small files from a slow mass memory.

For performance reasons I read the file to a variable, and all processing then happens using this variable. This allows me to read each file only once.

This works well, except when the last line is empty, then the variable will be one line shorter than the file, see simplified example below.

Is there a way to read empty lines in the end of the file to a variable?

$ rm -f /tmp/a ; for i in $(seq 3) ; do echo $i >> /tmp/a ; done
$ cat /tmp/a
1
2
3
$ wc -l /tmp/a
3 /tmp/a
$ a="$(cat /tmp/a)"
$ echo "$a"
1
2
3
$ echo "$a" | wc -l
3

$ rm -f /tmp/b ; for i in $(seq 3) ; do echo $i >> /tmp/b ; done
$ echo >> /tmp/b # ADD EXTRA EMPTY LINE TO THE END
$ cat /tmp/b
1
2
3

$ wc -l /tmp/b
4 /tmp/b
$ b="$(cat /tmp/b)"
$ echo "$b"
1
2
3
$ echo "$b" | wc -l
3

750

asked Apr 23 '18 11:04

Paavo Leinonen

1 Answers

$(...) strips all trailing newlines. From the bash man page:

Command substitution allows the output of a command to replace the command name. There are two forms:
$(command)
or
`command`
Bash performs the expansion by executing command and replacing the command substitution with the standard output of the command, with any trailing newlines deleted. Embedded newlines are not deleted, but they may be removed during word splitting. The command substitution $(cat file) can be replaced by the equivalent but faster $(< file).

Use mapfile to read the entire file in while preserving newlines. It reads each line into an array.

$ mapfile b < /tmp/b
$ printf '%s' "${b[@]}"
1
2
3

$ printf '%s' "${b[@]}" | wc -l
4

Avoid echo, which adds an extra newline. printf '%s' doesn't do that, so you're getting exactly what's in the array.

If don't want an array you can use printf -v to flatten it into a single string while preserving newlines.

$ mapfile b < /tmp/b
$ printf -v b '%s' "${b[@]}"
$ printf '%s' "$b"
1
2
3

$ printf '%s' "$b" | wc -l
4

For performance reasons I read the file to a variable, and all processing then happens using this variable. This allows me to read each file only once.

This may be premature optimization. Once a file is read from disk the OS will keep it in cache. Re-reading files that are still in cache is extremely fast.

answered Oct 26 '22 17:10

John Kugelman

Related questions
                            
                                Checking the success of a command in a bash `if [ .. ]` statement
                            
                                Can the exit code of a process overflow for small values?
                            
                                What is the Faults column in 'top'?
                            
                                Why does a space cause the remembered pattern in sed to output different things
                            
                                How to specify commandline arguments in pgrep in bash?
                            
                                Loop inside "heredoc" in shell scripting
                            
                                How do I copy with scp with a wildcard(*) in the destination path?
                            
                                AES128-CBC "bad magic number" and "error reading input file"
                            
                                unset bash function variable with non-standard name
                            
                                Bash: Killing all processes in subprocess
                            
                                Bash insert subnode to XML file
                            
                                Bash scripts extremely slow to start on OSX
                            
                                bash: msbuild: command not found
                            
                                What does grep -Po '...\K...' do? How else can that effect be achieved?
                            
                                how to run .bat or .cmd files in bash for windows 10
                            
                                CockroachDB Docker Compose Script with SQL commands
                            
                                How can I execute parallel "for" loops in Bash?
                            
                                How to set environment variable and execute command in one line (PowerShell)?
                            
                                Error using timeout command - invalid time interval
                            
                                Bash complains about syntax errors in here-document when using backticks

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With