Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I remove the character "" from the beginning of a text file in C++?

Tags:

c++

byte

I'm trying to read a text file, and for each word, I will put them into a node of a binary search tree. However, the first character is always read as " + first word". For example, if my first word is "This", then the first word that is inserted into my node is "This". I've been searching the forum for a solution to fix it, there was one post asking the same problem in Java, but no one has addressed it in C++. Would anyone help me to fix it ? Thank you.

I came to the a simple solution. I opened the file in Notepad, and saved it as ANSI. After that, the file is reading and passing correctly into the binary search tree

like image 570
Hoang Minh Avatar asked Dec 26 '13 03:12

Hoang Minh


2 Answers

That's UTF-8's BOM

You need to read the file as UTF-8. If you don't need Unicode and just use the first 127 ASCII code points then save the file as ASCII or UTF-8 without BOM

like image 79
phuclv Avatar answered Oct 12 '22 08:10

phuclv


This is Byte Order Mark (BOM). It's the representation for the UTF-8 BOM in ISO-8859-1. You have to tell your editor to not use BOMs or use a different editor to strip them out.

In C++, you can use the following function to convert a UTF-8 BOM file to ANSI.

void change_encoding_from_UTF8BOM_to_ANSI(const char* filename)
{
    ifstream infile;
    string strLine="";
    string strResult="";
    infile.open(filename);
    if (infile)
    {
        // the first 3 bytes (ef bb bf) is UTF-8 header flags
        // all the others are single byte ASCII code.
        // should delete these 3 when output
        getline(infile, strLine);
        strResult += strLine.substr(3)+"\n";

        while(!infile.eof())
        {
            getline(infile, strLine);
            strResult += strLine+"\n";
        }
    }
    infile.close();

    char* changeTemp=new char[strResult.length()];
    strcpy(changeTemp, strResult.c_str());
    char* changeResult = change_encoding_from_UTF8_to_ANSI(changeTemp);
    strResult=changeResult;

    ofstream outfile;
    outfile.open(filename);
    outfile.write(strResult.c_str(),strResult.length());
    outfile.flush();
    outfile.close();
}
like image 39
herohuyongtao Avatar answered Oct 12 '22 08:10

herohuyongtao