Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change the encoding to UTF-8 on a stream (MemoryMappedViewStream)

I am using the code below to read a ~2.5Gb Xml file as fast as I can (thanks to MemoryMappedFile). However, I am getting the following exception: "'.', hexadecimal value 0x00, is an invalid character. Line 9778, position 73249406.". I beleive it is due to some encoding problem. How do I make sure that the MemoryMappedViewStream reads the file using UTF-8?

static void Main(string[] args)
{
    using (var file = MemoryMappedFile.CreateFromFile(@"d:\temp\temp.xml", FileMode.Open, "MyMemMapFile"))
    {
        using (MemoryMappedViewStream stream = file.CreateViewStream())
        {
            Read(stream);
        }
    }
}

static void Read(Stream stream)
{
    using (XmlReader reader = XmlReader.Create(stream))
    {
        reader.MoveToContent();

        while (reader.Read())
        {
        }
     }
 }
like image 386
Martin Avatar asked Mar 15 '26 17:03

Martin


1 Answers

You could use the StreamReader class to set the encoding:

static void Main(string[] args)
{
  using (var file = MemoryMappedFile.CreateFromFile(@"d:\temp\temp.xml", FileMode.Open,  "MyMemMapFile"))
  {
     using (MemoryMappedViewStream stream = file.CreateViewStream())
    {
        Read(stream);
    }
   }
}

static void Read(Stream stream)
{
  using (XmlReader reader = XmlReader.Create(new StreamReader(stream, Encoding.UTF8)))
  {
     reader.MoveToContent();

    while (reader.Read())
    {
    }
 }
}

Hope, this helps.

like image 195
Hans Avatar answered Mar 17 '26 09:03

Hans



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!