Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it necessary to write a "portable" if (c == '\n') to process cross-platform files?

This thinking comes from a discussion about a practical problem Replacing multiple new lines in a file with just one. Something wrong happened while using a cygwin terminal running on a windows 8.1 machine.

Since the end-of-line terminator would be different, like \n, \r, or \r\n, is it necessary to write a "portable" if(c=='\n') to make it work well on Linux, Windows and OS X? Or, the best practise is just to convert the file with commands/tools?

  #include <stdio.h>
    int main ()
    {
      FILE * pFile;
      int c;
      int n = 0;
      pFile=fopen ("myfile.txt","r");
      if (pFile==NULL) perror ("Error opening file");
      else
      {
        do {
          c = fgetc (pFile);
          if (c == '\n') n++; // will it work fine under different platform?
        } while (c != EOF);
        fclose (pFile);
        printf ("The file contains %d lines.\n",n);
      }
      return 0;
    }

Update1:

CRT will always convert line endings into '\n'?

like image 974
Eric Tsui Avatar asked Jun 26 '15 09:06

Eric Tsui


1 Answers

If an input file is opened in binary mode (the character 'b' in the mode string) then it is necessary to worry about the possible presence of '\r' before '\n'.

If the file is not opened in binary mode (and also not read using binary functions such as fread()) then it is not necessary to worry about the presence of '\r' before '\n' because that will be handled before the input is received by your code - either by a relevant system function (e.g. device driver that reads input from disk, or from stdin) or by the implementation of the functions you use to read input from the file.

If you are transferring files between systems (e.g. writing the file under linux, and transferring it to a windows system, where a program tries to read it in) then you have options;

  • write and read the file in non-binary mode, and do a relevant translation of the file when transferring it between systems. If using ftp this can be handled by transferring the file using text mode rather than binary mode. If the file is transferred in binary mode, the you will need to run the file through dos2unix (if transferring the file to unix) or through unix2dos (going the other way).
  • Do all your I/O in binary mode, transfer them between systems using binary mode, and never read them in non-binary mode. Among other things, this gives you explicit control over what data is in the file.
  • Write your file in text mode, transfer the file as you see fit. Then only read in binary mode and, when your reading code encounters a \r\n pair, drop the '\r' character.

The last is arguably the most robust - the writing code might include \r before \n characters, or it might not, but the reading code simply ignores any '\r' characters that it encounters before a '\n' character. Such code will probably even cope if the files are edited by hand (e.g. with a text editor - that might be separately configured to either insert or remove \r and \n) before being read.

like image 158
Peter Avatar answered Sep 30 '22 20:09

Peter