Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cross-platform newline confusion

Tags:

c++

c

text

newline

For some reason, my write-to-textfile function stopped working all of a sudden.

void write_data(char* filename, char* writethis)
{
    ofstream myfile;
    myfile.open (filename, std::ios_base::app);
    myfile << endl << writethis;
    myfile.close();
}

The function was called from a loop, so basically it started with an empty line and appended all the following "writethis" lines on a new line.

Then all of a sudden, no more newlines. All text was appended on one single line. So I did some digging and I came across this:

  1. Windows = CR LF
  2. Linux = LF
  3. MAC < 0SX = CR

So I changed the line to

myfile << "\r\n" << writethis;

And it worked again. But now I'm confused. I am coding on linux but I am reading the textfiles created with the program out on windows after transferring them with filezilla. Now which part of this caused the lines in the textfile to appear as one line?

I was pretty sure "endl" worked just fine for linux so now I'm thinking windows messed the file up after transferring them with filezilla? Messing up the way the text file is written to (and read out) will guarantee my program to break, so if someone can explain this I'd appreciate it.

I also don't recall what I changed in my program to cause this to break, because it was working just fine earlier. The only thing I added was threading.

Edit: I have tried swapping the transfer mode from ASCII / Binary (even removed the force-ASCII-for-txt-extension), but it makes no differences. The newlines appear in linux, but not on windows. fz-messup

How odd.

like image 872
natli Avatar asked Nov 07 '11 07:11

natli


4 Answers

What happens is that you write the Unix line endings ('\n'), then transfer it to a Windows machine getting a bitwise identical file, then trying to open the file with a viewer that does not understand Unix line endings (Notepad likely).

From my experience on writing portable code:

  • Standardize on ONE line-ending ('\n', LF) on ALL platforms.
  • Always open your files in binary, even if you write text.
  • Let the user who opens the file use a text viewer that understands any line-endings. There are plenty for windows (including Visual Studio, Notepad++, Wordpad and your favorite browser).

Yes, I do think that there is more benefit to everybody to standardize on one thing rather than supporting all of them everywhere. Also I deny the existence of "proper line endings on the proper platform". The fact that Microsoft decided that their native API does not speak UTF-8 or does not understand Unix line endings does not prevent everybody's code from doing that, on Windows. Just make sure not to pass this stuff to WinAPI. Many times you do text processing on your internal data that the system will not ever see, so why the hell do you need to complicate your life by meeting the expectations of those system's internals?

like image 88
Yakov Galka Avatar answered Oct 14 '22 08:10

Yakov Galka


endl does "work just fine for Linux". Streaming endl streams a \n character and flushes the stream. Always.

However, a file stream in text mode will convert this \n to \r\n at the implementation layer on Windows, and you'll often find line endings being converted as you transfer the file between platforms, too.

This is probably not a C++ problem, and nothing is "broken"; you should probably configure FileZilla to treat your file as text rather than "binary" (a mode in which line endings are not converted). If your file has no name extension like ".txt" then it probably doesn't do this by default.

like image 28
Lightness Races in Orbit Avatar answered Oct 14 '22 08:10

Lightness Races in Orbit


FTP can mess up your files (that is, it converts newlines) if you transfer files as ASCII. Try transfering as BIN (binary).

like image 3
johndodo Avatar answered Oct 14 '22 08:10

johndodo


Internally all applications use '\n' to indicate line termination.

The problem is that the line termination sequence is platform specific for text files (as your research turned up) Note: Text files, this is the default format when you open a file. If you explicitly select binary when opening a file no translation happens when reading/writing.

What this actually means is that the '\n' character is transformed into a platform specific sequence of character when you write it to a file. But also note that this platform specific sequence is converted back to '\n' when the file is read. The problem you are encountering is that you have written the files on one platform and read them on another.

On linux the line termination sequence is LF ('\n'). Thus you write the file and all '\n' are converted into 'LF' characters. You transfer these files to a windows system and now read the file. On windows the line termination sequence is 'CRLF' So the editor that read the file is looking for two characters to convert back to '\n' but does not find these characters. Now it depends on how smart the editor is as to whether you get a single line or multiple lines.

like image 2
Martin York Avatar answered Oct 14 '22 10:10

Martin York