Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

read -N and IFS

Tags:

bash

ifs

According to "read -N" description in manual page:

-N nchars return only after reading exactly NCHARS characters, unless EOF is encountered or read times out, ignoring any delimiter

However, in answer to following command:

$ echo 'a b' | while read -N1 c; do echo ">>>$c<<<"; done
>>>a<<<
>>><<<
>>>b<<<
>>><<<

both, space and newline have been translated into empty string, while in the command:

$ echo 'a b' | while IFS= read -N1 c; do echo ">>>$c<<<"; done
>>>a<<<
>>> <<<
>>>b<<<
>>>
<<<

space and newline have been stored correctly in the variable.

So, it seems delimiters still has some processing in "read" or "while" command, that I do not understand.

We could compare these results with the ones using "read -n", that manual described as:

-n nchars return after reading NCHARS characters rather than waiting for a newline, but honor a delimiter if fewer than NCHARS characters are read before the delimiter

$ echo 'a b' | while read -n1 c; do echo ">>>$c<<<"; done
>>>a<<<
>>><<<
>>>b<<<
>>><<<

$ echo 'a b' | while IFS= read -n1 c; do echo ">>>$c<<<"; done
>>>a<<<
>>> <<<
>>>b<<<
>>><<<
like image 270
pasaba por aqui Avatar asked Aug 23 '15 11:08

pasaba por aqui


People also ask

What does while IFS read do?

while IFS= read -r line; do printf '%s\n' "$line"; done < input_file. How does it work? The input file ( input_file ) is the name of the file redirected to the while loop. The read command processes the file line by line, assigning each line to the line variable. Once all lines are processed, the while loop terminates.

What is IFS in script?

For many command line interpreters (“shell”) of Unix operating systems, the input field separators variable (abbreviated IFS, and often referred to as internal field separators) refers to a variable which defines the character or characters used to separate a pattern into tokens for some operations.

What is IFS in while loop?

The while loop syntax IFS is used to set field separator (default is while space). The -r option to read command disables backslash escaping (e.g., \n, \t). This is failsafe while read loop for reading text files.

What is IFS readline?

IFS is a variable for the line separator (or actually "Internal Field Separator"). That code will effectively empty out the line separator for your read command and set it to its default.


4 Answers

This is POSIX behaviour. When assigning to a variable, IFS characters should be stripped: the results shall be split into fields as in the shell for the results of parameter expansion (of course, -n and -N are not POSIX).

This is born-out by the read source code comments:

/* This code implements the Posix.2 spec for splitting the words
     read and assigning them to variables. */
  orig_input_string = input_string;

  /* Remove IFS white space at the beginning of the input string.  If
     $IFS is null, no field splitting is performed. */
like image 99
cdarke Avatar answered Sep 19 '22 04:09

cdarke


In my opinion, while using option -N, the behavior of read is different when

  • Reading a delimiter as input
  • Assigning that delimiter to a variable

When it's reading a character, a delimiter treats as same as a non-delimiter and read will count them. But, when read is assigning the delimiter, it considers that if the read input is a delimiter or not, if it's a delimiter it assigns a null to the corresponding variable.

So, IFS= will change the behavior of assigning a white-space to a variable and causes a space to be assigned to c rather than a null.

like image 30
masoud Avatar answered Sep 20 '22 04:09

masoud


Using hexdump allows us to see exactly the characters making up the output, so it may be helpful to slightly change your queries:

(1) With normal IFS and using -N option

$ (echo 'a b' | while read -N1 c; do c="$c<"; echo -n "$c"; done | hexdump -C)
00000000  61 3c 3c 62 3c 3c                                 |a<<b<<|
00000006 

In this first case, the read builtin for both 0x0a and the space character returns the empty string, as characters are in the default IFS and characters in the IFS are ignored in the output for the reason explained in cdarke's answer.

(2) With empty IFS and -N option

$ (IFS=""; echo 'a b' | while read -N1 c; do c="$c<"; echo -n "$c"; done | hexdump -C)
00000000  61 3c 20 3c 62 3c 0a 3c                              |a< <b<.<|
00000008

In this case, the read builtin will match each of the four characters that the echo command outputs, and both 0x0a and a space are seen in the output, because with an empty IFS the characters read can be assigned to the local variable c.

(3) With normal IFS and -n option

$ (echo 'a b' | while read -n1 c; do c="$c<"; echo -n "$c"; done | hexdump -C)
00000000  61 3c 3c 62 3c 3c                                 |a<<b<<|
00000006 

This gives just the same output as case (1), although the semantics are a bit different: the read builtin for both 0x0a and the space character return the empty string, as (i) both of these characters are in the default IFS and (ii) the -n option to the read builtin in any case does not pass on the trailing 0x0a character

(4) With empty IFS and -n option

$ (IFS=""; echo 'a b' | while read -n1 c; do c="$c<"; echo -n "$c"; done | hexdump -C)
00000000  61 3c 20 3c 62 3c 3c                              |a< <b<<|
00000007

Here we observe a difference between the -n and -N options to read: with the -n option, the newline is treated specially by the read builtin and dropped, hence the exclusion of 0x0a from IFS doesn't have an opportunity to allow it to be passed to the local variable c.

like image 22
Charles Stewart Avatar answered Sep 22 '22 04:09

Charles Stewart


read cannot decide if a character is a delimiter (to ignore it) until it has already read the character, and read must assign some value to c, even if that value is the empty string. When a delimiter is read and subsequently discarded, the value of c must be set to something, so it is assigned the empty string.

This is consistent with read used without the -n/-N options; delimiters are only discarded after they are read and if they aren't necessary to set the value of the provided parameter(s). The simplest case is when you don't provide any arguments to read:

$ read <<< " a b c "
$ echo ">>>$REPLY<<<"
>>> a b c <<<

With a single explicit argument, leading and trailing delimiters are stripped:

$ read line <<< " a b c "
$ echo ">>>$line<<<"
>>>a b c<<<

With two arguments, the first delimiter is ignored once it has been read. The second is retained, because the string only needs to be split into two words to fill the provided parameters.

$ read field1 field2 <<< " a b c """
$ echo ">>>$field1<<<"
>>>a<<<
$ echo ">>>$field2<<<"
>>>b c<<<
like image 41
chepner Avatar answered Sep 19 '22 04:09

chepner