Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I convert encoding of a large file (>1 GB) in size - to Windows 1252 without an out-of-memory exception?

Consider:

public static void ConvertFileToUnicode1252(string filePath, Encoding srcEncoding)
{
    try
    {
        StreamReader fileStream = new StreamReader(filePath);
        Encoding targetEncoding = Encoding.GetEncoding(1252);

        string fileContent = fileStream.ReadToEnd();
        fileStream.Close();

        // Saving file as ANSI 1252
        Byte[] srcBytes = srcEncoding.GetBytes(fileContent);
        Byte[] ansiBytes = Encoding.Convert(srcEncoding, targetEncoding, srcBytes);
        string ansiContent = targetEncoding.GetString(ansiBytes);

        // Now writes contents to file again
        StreamWriter ansiWriter = new StreamWriter(filePath, false);
        ansiWriter.Write(ansiContent);
        ansiWriter.Close();
        //TODO -- log success  details
    }
    catch (Exception e)
    {
        throw e;
        // TODO -- log failure details
    }
}

The above piece of code returns an out-of-memory exception for large files and only works for small-sized files.

like image 280
Tino Jose Thannippara Avatar asked Mar 02 '17 09:03

Tino Jose Thannippara


2 Answers

I think still using a StreamReader and a StreamWriter but reading blocks of characters instead of all at once or line by line is the most elegant solution. It doesn't arbitrarily assume the file consists of lines of manageable length, and it also doesn't break with multi-byte character encodings.

public static void ConvertFileEncoding(string srcFile, Encoding srcEncoding, string destFile, Encoding destEncoding)
{
    using (var reader = new StreamReader(srcFile, srcEncoding))
    using (var writer = new StreamWriter(destFile, false, destEncoding))
    {
        char[] buf = new char[4096];
        while (true)
        {
            int count = reader.Read(buf, 0, buf.Length);
            if (count == 0)
                break;

            writer.Write(buf, 0, count);
        }
    }
}

(I wish StreamReader had a CopyTo method like Stream does, if it had, this would be essentially a one-liner!)

like image 69
Matti Virkkunen Avatar answered Oct 04 '22 02:10

Matti Virkkunen


Don't readToEnd and read it like line by line or X characters at a time. If you read to end, you put your whole file into the buffer at once.

like image 31
Dimitri Bosteels Avatar answered Oct 04 '22 00:10

Dimitri Bosteels