I stumbled upon strange behavior of string::substr. Normally I code on Windows 7 in Eclipse+MinGW, but when I was working on my laptop, using Eclipse in Linux (Ubuntu 12.04) I noticed difference in result.
I was working with vector< string > filled with lines of text. One of steps was to remove last character from line.
In win7 Eclipse I did:
for( int i = 0; i < (int)vectorOfLines.size(); i++ )
{
vectorOfTrimmedLines.push_back( ((string)vectorOfLines.at(i)).substr(0, ((string)vectorOfLines.at(i)).size()-1) );
}
and it works like intended (removing last character from each line)
But in Linux this code do not trim. Instead I needed to do it like this:
// -2 instead -1 character
vectorOfTrimmedLines.push_back( ((string)vectorOfLines.at(i)).substr(0, ((string)vectorOfLines.at(i)).size()-2) );
or using another method:
vectorOfTrimmedLines.push_back( ((string)vectorOfLines.at(i)).replace( (((string)vectorOfLines.at(i)).size()-2),1,"",0 ));
Ofcourse Linux methods work wrong way on windows (trimming 2 last characters, or replacing one before last).
The problem seems to be that myString.size() return number of characters in Windows, but in Linux it returns number of characters + 1. Could it be that new line character is counted on Linux?
As a newbie in C++ and programming general, I wonder why it is like that, and how can this be done to be platform independent.
Another thing that I wonder is : which method is preferable (faster) substr or replace?
Edit: Method used to fill string s this function i wrote:
vector< string > ReadFile( string pathToFile )
{
// opening file
ifstream myFile;
myFile.open( pathToFile.c_str() );
// vector of strings that is returned by this function, contains file line by line
vector< string > vectorOfLines;
// check if the file is open and then read file line by line to string element of vector
if( myFile.is_open() )
{
string line; // this will contain the data read from current the file
while( getline( myFile, line ) ) // until last line in file
{
vectorOfLines.push_back( line ); // add current line to new string element in vector
}
myFile.close(); // close the file
}
// if file does not exist
else
{
cerr << "Unable to open file." << endl; // if the file is not open output
//throw;
}
return vectorOfLines; // return vector of lines from file
}
Text files are not identical on different operating systems. Windows uses a two-byte code to mark the end of a line: 0x0D, 0x0A. Linux uses one byte, 0x0A. getline
(and most other input functions) knows the convention for the OS that it was compiled for; when it reads the character(s) that the OS uses to represent the end of a line, it replaces the character(s) with '\n'. So if you write a text file under Windows, lines end with 0x0D, 0x0A; if you read that text file under Linux, getline
sees 0x0D and treats it as a normal character, then it sees 0x0A, and treats it as the end of the line.
So the moral is that you must convert text files to the native representation when you move them from one system to another. ftp
knows how to do this. If you're running in a virtual box, you have to do the conversion manually when you switch systems. It's simple enough with tr
from a Unix command line.
This is because in Windows, newline is represented by two characters CR+LF, while on Linux it's only LF, and on Mac (prior to OSX) it's only CR.
As long as you only use files generated on Linux on Linux systems or files generated on Windows on Windows systems, you would have nothing to worry about. But as soon as you need to use a file generated on Linux on Windows or vice versa, you need to handle newline correctly.
As a first step, you need to open the file in binary mode std::ofstream infile( "filename", std::ios_base::binary);
, then you have three options:
Or, as has been said, use Boost.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With