Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

QString::split() and "\r", "\n" and "\r\n" convention

Tags:

c++

qt

I understand that QString::split should be used to get a QStringList from a multiline QString. But if I have a file and I don't know if it comes from Mac, Windows or Unix, I'm not sure if QString.split("\n") would work well in all the cases. What is the best way to handle this situation?

like image 256
sashoalm Avatar asked Apr 27 '12 09:04

sashoalm


2 Answers

If it's acceptable to remove blank lines, you can try:

QString.split(QRegExp("[\r\n]"),QString::SkipEmptyParts); 

This splits the string whenever any of the newline character (either line feed or carriage return) is found. Any consecutive line breaks (e.g. \r\n\r\n or \n\n) will be considered multiple delimiters with empty parts between them, which will be skipped.

like image 96
Emanuele Bezzi Avatar answered Sep 23 '22 22:09

Emanuele Bezzi


Emanuele Bezzi's answer misses a couple of points.

In most cases, a string read from a text file will have been read using a text stream, which automatically translates the OS's end-of-line representation to a single '\n' character. So if you're dealing with native text files, '\n' should be the only delimiter you need to worry about. For example, if your program is running on a Windows system, reading input in text mode, line endings will be marked in memory with single \n characters; you'll never see the "\r\n" pairs that exist in the file.

But sometimes you do need to deal with "foreign" text files.

Ideally, you should probably translate any such files to the local format before reading them, which avoids the issue. Only the translation utility needs to be aware of variant line endings; everything else just deals with text.

But that's not always possible; sometimes you might want your program to handle Windows text files when running on a POSIX system (Linux, UNIX, etc.), or vice versa.

A Windows-format text file on a POSIX system will appear to have an extra '\r' character at the end of each line.

A POSIX-format text file on a Windows system will appear to consist of one very long line with embedded '\n' characters.

The most general approach is to read the file in binary mode and deal with the line endings explicitly.

I'm not familiar with QString.split, but I suspect that this:

QString.split(QRegExp("[\r\n]"),QString::SkipEmptyParts);

will ignore empty lines, which will appear either as "\n\n" or as "\r\n\r\n", depending on the format. Empty lines are perfectly valid text data; you shouldn't ignore them unless you're certain that it makes sense to do so.

If you need to deal with text input delimited either by "\n", "\r\n", or "\r", then I think something like this:

QString.split(QRegExp("\n|\r\n|\r"));

would do the job. (Thanks to parsley72's comment for helping me with the regular expression syntax.)

Another point: you're probably not likely to encounter text files that use just '\r' to delimit lines. That's the format used by MacOS up to version 9. MaxOS X is based on UNIX, and it uses standard UNIX-style '\n' line endings (though it probably tolerates '\r' line endings as well).

like image 29
Keith Thompson Avatar answered Sep 25 '22 22:09

Keith Thompson