Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Very strange behavior - printf & strcmp ignore my input string in only one line

Tags:

c

unix

this is the code:

printf("   DEBUG:%s\n" ,array[7] );
printf("address of %s is %p  (again %d)\n",
    array[7],
    array[7],
    strcmp("N\\A", array[7]) );

printf("5DEBUG collection:%s\n" ,array[7] );

this is the output:

DEBUG:N\A

is 0x7c0600 (again -13)

5DEBUG collection:N\A

as you can see , in the second printf - array[7] (need to point to "N\A" ) disappeared.

I have no idea whats going on here...

like image 901
avi.c Avatar asked Dec 07 '22 07:12

avi.c


1 Answers

You are reading a Windows-formatted file on unix. Windows and unix use different line terminators. Unix uses 0x0A, whereas Windows use 0x0D followed by 0x0A.

If you have a line in the file that ends in 0x0D 0x0A, unix will treat the 0x00A as the line terminator but include the 0x0D as part of the string.

You can see this in your strcmp, which returns -13. Notice that it does not return zero, which means that the strings are not equal. In fact, the difference is 13, the decimal value of 0x0D, which confirms that array[7] has a 0x0D at the end.

The other evidence of this is the odd printing behavior you're seeing. On unix, printing 0x0D causes the cursor to return to column 0 of the same line. Therefore, the second print instruction begins by printing

address of N\A

and then it encounters the 0x0D, which moves the cursor back to column 0. The remainder of the string therefore overprints the output, resulting in

 is 0x7c0600 (again -13)

If you tried to debug the program in a debugger with a breakpoint on the code, you would have noticed that array[7] has a 0x0D at the end.

Added

This was not psychic debugging. It was actually quite straightforward. Here's the step-by-step:

  1. When you noticed the odd behavior, you should use the debugger to look at the string in array[7]. If you had done that, you would have seen the trailing 0x0Dand the problem would have been solved in 5 seconds.
  2. The next huge clue was that the result of strcmp was not zero. This means that the string in array[7] is not equal to "N\\A", which is your next huge clue that you should use the debugger to look at the string in array[7] to see what it actually is.
  3. Without the benefit of debugging, I observed that the difference between array[7] and "N\\A" must be in something that is not readily visible, since the first line printed okay. The options here are control characters or whitespace.
  4. The fact that strcmp reported a difference of 13 suggests that the string in array[7] has an 0x0D at the end: "N\\A" ends with a \0 (numerical value zero), and a difference of 13 suggests that array[7] ends with 0x0D, since 0x0d hex = 13 decimal.
  5. If you didn't use the logic from step 4, you could have stopped to think, "What characters would mess up printing?" You run down the mental list. Space, tab (causes multiple spaces to be printed), carriage return (returns cursor to column 0), newline (advances to next line), form feed (clears the screen), and escape (introduces console control sequences). The one that matches the evidence is carriage return, whose ASCII code is (surprise) 13.
  6. If you didn't use the logic from steps 4 or 5, you could have studied the statement that the problem does not exist on Windows. Windows uses 0x0D 0x0A as its line terminator, whereas unix uses 0x0A. The extra 0x0D is a carriage return which returns the cursor to column 0, which again matches the evidence.

So there were four independent ways of coming to the same diagnosis. (Five if you count "Look at the string in the debugger.") Since they all agreed, this made for a rather confident conclusion. My actual analysis started with step 5, then used the other steps to confirm the diagnosis.

like image 187
Raymond Chen Avatar answered Jan 26 '23 01:01

Raymond Chen