Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bash: reading a short text file to a variable loses last empty line

Tags:

bash

I have a script that processes a number of small files from a slow mass memory.

For performance reasons I read the file to a variable, and all processing then happens using this variable. This allows me to read each file only once.

This works well, except when the last line is empty, then the variable will be one line shorter than the file, see simplified example below.

Is there a way to read empty lines in the end of the file to a variable?

$ rm -f /tmp/a ; for i in $(seq 3) ; do echo $i >> /tmp/a ; done
$ cat /tmp/a
1
2
3
$ wc -l /tmp/a
3 /tmp/a
$ a="$(cat /tmp/a)"
$ echo "$a"
1
2
3
$ echo "$a" | wc -l
3

$ rm -f /tmp/b ; for i in $(seq 3) ; do echo $i >> /tmp/b ; done
$ echo >> /tmp/b # ADD EXTRA EMPTY LINE TO THE END
$ cat /tmp/b
1
2
3

$ wc -l /tmp/b
4 /tmp/b
$ b="$(cat /tmp/b)"
$ echo "$b"
1
2
3
$ echo "$b" | wc -l
3
like image 750
Paavo Leinonen Avatar asked Apr 23 '18 11:04

Paavo Leinonen


People also ask

How do I see the last line of a file in bash?

To look at the last few lines of a file, use the tail command. tail works the same way as head: type tail and the filename to see the last 10 lines of that file, or type tail -number filename to see the last number lines of the file.

How read file line by line in shell script and store each line in a variable?

We use the read command with -r argument to read the contents without escaping the backslash character. We read the content of each line and store that in the variable line and inside the while loop we echo with a formatted -e argument to use special characters like \n and print the contents of the line variable.


1 Answers

$(...) strips all trailing newlines. From the bash man page:

Command substitution allows the output of a command to replace the command name. There are two forms:

$(command)

or

`command`

Bash performs the expansion by executing command and replacing the command substitution with the standard output of the command, with any trailing newlines deleted. Embedded newlines are not deleted, but they may be removed during word splitting. The command substitution $(cat file) can be replaced by the equivalent but faster $(< file).

Use mapfile to read the entire file in while preserving newlines. It reads each line into an array.

$ mapfile b < /tmp/b
$ printf '%s' "${b[@]}"
1
2
3

$ printf '%s' "${b[@]}" | wc -l
4

Avoid echo, which adds an extra newline. printf '%s' doesn't do that, so you're getting exactly what's in the array.

If don't want an array you can use printf -v to flatten it into a single string while preserving newlines.

$ mapfile b < /tmp/b
$ printf -v b '%s' "${b[@]}"
$ printf '%s' "$b"
1
2
3

$ printf '%s' "$b" | wc -l
4

For performance reasons I read the file to a variable, and all processing then happens using this variable. This allows me to read each file only once.

This may be premature optimization. Once a file is read from disk the OS will keep it in cache. Re-reading files that are still in cache is extremely fast.

like image 86
John Kugelman Avatar answered Oct 26 '22 17:10

John Kugelman