Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ reading from file puts three weird characters

When i read from a file string by string, >> operation gets first string but it starts with "i" . Assume that first string is "street", than it gets as "istreet".

Other strings are okay. I tried for different txt files. The result is same. First string starts with "i". What is the problem?

Here is my code :

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;

int cube(int x){ return (x*x*x);}

int main(){

int maxChar;
int lineLength=0;
int cost=0;

cout<<"Enter the max char per line... : ";
cin>>maxChar;
cout<<endl<<"Max char per line is : "<<maxChar<<endl;

fstream inFile("bla.txt",ios::in);

if (!inFile) {
    cerr << "Unable to open file datafile.txt";
    exit(1);   // call system to stop
}

while(!inFile.eof()) {
    string word;

    inFile >> word;
    cout<<word<<endl;
    cout<<word.length()<<endl;
    if(word.length()+lineLength<=maxChar){
        lineLength +=(word.length()+1);
    }
    else {
        cost+=cube(maxChar-(lineLength-1));
        lineLength=(word.length()+1);
    }   
}

}
like image 216
vkx Avatar asked Dec 20 '22 22:12

vkx


2 Answers

You're seeing a UTF-8 Byte Order Mark (BOM). It was added by the application that created the file.

To detect and ignore the marker you could try this (untested) function:

bool SkipBOM(std::istream & in)
{
    char test[4] = {0};
    in.read(test, 3);
    if (strcmp(test, "\xEF\xBB\xBF") == 0)
        return true;
    in.seekg(0);
    return false;
}
like image 197
Mark Ransom Avatar answered Dec 24 '22 01:12

Mark Ransom


With reference to the excellent answer by Mark Ransom above, adding this code skips the BOM (Byte Order Mark) on an existing stream. Call it after opening a file.

// Skips the Byte Order Mark (BOM) that defines UTF-8 in some text files.
void SkipBOM(std::ifstream &in)
{
    char test[3] = {0};
    in.read(test, 3);
    if ((unsigned char)test[0] == 0xEF && 
        (unsigned char)test[1] == 0xBB && 
        (unsigned char)test[2] == 0xBF)
    {
        return;
    }
    in.seekg(0);
}

To use:

ifstream in(path);
SkipBOM(in);
string line;
while (getline(in, line))
{
    // Process lines of input here.
}
like image 32
Contango Avatar answered Dec 24 '22 02:12

Contango