Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

string size is different on Windows than on Linux

I stumbled upon strange behavior of string::substr. Normally I code on Windows 7 in Eclipse+MinGW, but when I was working on my laptop, using Eclipse in Linux (Ubuntu 12.04) I noticed difference in result.

I was working with vector< string > filled with lines of text. One of steps was to remove last character from line.

In win7 Eclipse I did:

for( int i = 0; i < (int)vectorOfLines.size(); i++ )
{
    vectorOfTrimmedLines.push_back( ((string)vectorOfLines.at(i)).substr(0, ((string)vectorOfLines.at(i)).size()-1) );
}

and it works like intended (removing last character from each line)

But in Linux this code do not trim. Instead I needed to do it like this:

//  -2 instead -1 character
vectorOfTrimmedLines.push_back( ((string)vectorOfLines.at(i)).substr(0, ((string)vectorOfLines.at(i)).size()-2) );

or using another method:

vectorOfTrimmedLines.push_back( ((string)vectorOfLines.at(i)).replace( (((string)vectorOfLines.at(i)).size()-2),1,"",0 ));

Ofcourse Linux methods work wrong way on windows (trimming 2 last characters, or replacing one before last).

The problem seems to be that myString.size() return number of characters in Windows, but in Linux it returns number of characters + 1. Could it be that new line character is counted on Linux?

As a newbie in C++ and programming general, I wonder why it is like that, and how can this be done to be platform independent.

Another thing that I wonder is : which method is preferable (faster) substr or replace?

Edit: Method used to fill string s this function i wrote:

vector< string > ReadFile( string pathToFile )
{
    //  opening file
    ifstream myFile;
    myFile.open( pathToFile.c_str() );

    //  vector of strings that is returned by this function, contains file line by line
    vector< string > vectorOfLines;

    //  check if the file is open and then read file line by line to string element of vector
    if( myFile.is_open() )
    {
        string line;    //  this will contain the data read from current the file

        while( getline( myFile, line ) )    //  until last line in file
        {
            vectorOfLines.push_back( line );    //  add current line to new string element in vector
        }

        myFile.close(); //  close the file
    }

    //  if file does not exist
    else
    {
        cerr << "Unable to open file." << endl; //  if the file is not open output
        //throw;
    }

    return vectorOfLines;   //  return vector of lines from file
}
like image 735
RegEx Avatar asked Dec 12 '22 21:12

RegEx


2 Answers

Text files are not identical on different operating systems. Windows uses a two-byte code to mark the end of a line: 0x0D, 0x0A. Linux uses one byte, 0x0A. getline (and most other input functions) knows the convention for the OS that it was compiled for; when it reads the character(s) that the OS uses to represent the end of a line, it replaces the character(s) with '\n'. So if you write a text file under Windows, lines end with 0x0D, 0x0A; if you read that text file under Linux, getline sees 0x0D and treats it as a normal character, then it sees 0x0A, and treats it as the end of the line.

So the moral is that you must convert text files to the native representation when you move them from one system to another. ftp knows how to do this. If you're running in a virtual box, you have to do the conversion manually when you switch systems. It's simple enough with tr from a Unix command line.

like image 140
Pete Becker Avatar answered Dec 20 '22 19:12

Pete Becker


This is because in Windows, newline is represented by two characters CR+LF, while on Linux it's only LF, and on Mac (prior to OSX) it's only CR.

As long as you only use files generated on Linux on Linux systems or files generated on Windows on Windows systems, you would have nothing to worry about. But as soon as you need to use a file generated on Linux on Windows or vice versa, you need to handle newline correctly.

As a first step, you need to open the file in binary mode std::ofstream infile( "filename", std::ios_base::binary);, then you have three options:

  1. You need to decide on a single newline convention for all platforms and use it consistently,
  2. You need to be able to detect the newline convention used in the current file (usually implemented by checking the newline used on the first line), save that in a variable, and pass it around to string functions that need to deal with newline,
  3. Tell the user to convert the file to the right newline, e.g. using dos2unix and unix2dos, or if the file transfer involves FTP, use ASCII mode

Or, as has been said, use Boost.

like image 23
Lie Ryan Avatar answered Dec 20 '22 20:12

Lie Ryan