Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# Image.FromStream(): Lost metadata when running in Windows 8 / 10

I have an application which retrieves an image from a web service. The web service would embed some metadata into the image before sending to the C# client.

This is part of the method. It retrieves the Stream from the Response object, and creates an Image from the stream. Note that I am using System.Drawing.Image, not the System.Windows.Controls.Image - this means that I cannot use any ImageSource or BitmapSource.

System.Drawing.Image img = null;
using (HttpWebResponse response = request.GetResponse() as HttpWebResponse)
{
    Stream stream = response.GetResponseStream();
    img = System.Drawing.Image.FromStream(stream);
    .......
}
return img;

The image looks perfectly fine, but there are metadata embedded inside. The image is in PNG format, and there is another method which would extract the information out from the Image. There are a total of six pieces of metadata embedded. The PNG format (the PNG chunks) is described here. The data are saved under "tEXt" chunk.

public static Hashtable GetData(Image image)
{
    Hashtable metadata = null;
    data = new Hashtable();

    byte[] imageBytes;
    using (MemoryStream stream = new MemoryStream())
    {
        image.Save(stream, image.RawFormat);
        imageBytes = new byte[stream.Length];
        imageBytes = stream.ToArray();
    }

    if (imageBytes.Length <= 8)
    {
        return null;
    }

    // Skipping 8 bytes of PNG header
    int pointer = 8;

    while (pointer < imageBytes.Length)
    {
        // read the next chunk
        uint chunkSize = GetChunkSize(imageBytes, pointer);
        pointer += 4;
        string chunkName = GetChunkName(imageBytes, pointer);
        pointer += 4;

        // chunk data -----
        if (chunkName.Equals("tEXt"))
        {
            byte[] data = new byte[chunkSize];
            Array.Copy(imageBytes, pointer, data, 0, chunkSize);
            StringBuilder stringBuilder = new StringBuilder();
            foreach (byte t in data)
            {
                stringBuilder.Append((char)t);
            }

            string[] pair = stringBuilder.ToString().Split(new char[] { '\0' });
            metadata[pair[0]] = pair[1];
        }

        pointer += (int)chunkSize + 4;

        if (pointer > imageBytes.Length)
            break;
    }
    return data;
}

private static uint GetChunkSize(byte[] bytes, int pos)
{
    byte[] quad = new byte[4];
    for (int i = 0; i < 4; i++)
    {
        quad[3 - i] = bytes[pos + i];
    }

    return BitConverter.ToUInt32(quad);
}

private static string GetChunkName(byte[] bytes, int pos)
{
    StringBuilder builder = new StringBuilder();
    for (int i = 0; i < 4; i++)
    {
        builder.Append((char)bytes[pos + i]);
    }

    return builder.ToString();
}

In Windows 7, all the six pieces of metadata are detected and extracted out. So in short, in Windows 7 environment, I managed to get everything I need.

When I move this to a Windows 10 terminal (also tried Windows 8), things become different. I am only able to extract 2 pieces of metadata out from the Image.

Because my GetData() method converts the Image into byte[], so I tried extracting the data right from the web service stream. I converted the stream into byte[], and used the same technique to extract the metadata from the byte[]. I managed to get all 6 metadata back using this method.

So the question is: What has changed? It works totally fine in Windows 7, but not so in Windows 8 and 10. I can still get back the data, provided I don't turn the stream into an Image. Somewhere in the process, the metadata is lost. It is either lost when I convert the stream to Image, or when I convert the Image back to byte[]. As a side note, I have tried converting the byte[] into string. The string representation of the byte[] from the stream looks different from the byte[] from the Image. Using the correct encoder, I could see the 4 metadata missing in the later's byte[].

like image 994
Jai Avatar asked Aug 01 '16 07:08

Jai


1 Answers

The metadata tEXt : is represented in ISO/IEC 8859-1

Try adding the following before you make your request:

 request.Headers.Add(HttpRequestHeader.AcceptCharset, "ISO-8859-1");

so, modify your code:

System.Drawing.Image img = null;

 //accept Charset "ISO-8859-1"
 request.Headers.Add(HttpRequestHeader.AcceptCharset, "ISO-8859-1");

using (HttpWebResponse response = request.GetResponse() as HttpWebResponse)
{
 Stream stream = response.GetResponseStream();
 img = System.Drawing.Image.FromStream(stream);
  .......
}
 return img;

just for information, can you post what is the windows EncodingName in windows 7/ 8/10

use the powershell command to know:

[System.Text.Encoding]::Default.EncodingName

Edit:

I reviewed the source code of DOTNet System.Drawing.Image.FromStream and found that statement:

  // [Obsolete("Use Image.FromStream(stream, useEmbeddedColorManagement)")]
    public static Image FromStream(Stream stream) { 
        return Image.FromStream(stream, false);
    }

try to use:

  Image.FromStream(stream, true); 
  or
 Image.FromStream(stream, true,true);

for details of the parameters:

  public static Image FromStream(
  Stream stream,
  bool useEmbeddedColorManagement,////true to use color management  information embedded in the data stream; otherwise, false. 
  bool validateImageData //true to validate the image data; otherwise, false.
  )

Image.FromStream Method

Edit 2:

I did an experiment on PNG image file with tEXT data:

I developed a function to measure the size of the image in bytes which is read by the function FromStream() and I executed on both win7 /win 10.

The following table, represent the real size of the image in bytes in both environment:

 The file size: 502,888 byte (real size on disk).     

 win 7         win10        function used
 569674        597298      Image.FromStream(stream, true,true)
 597343        597298      Image.FromStream(stream, true)
 597343        597298      Image.FromStream(stream, false)

You find that the size is different in both environment and is different than the real size in disk.

So, you expect that position of meta data is changed (but not lost, only re-allocated)

I used hexadecimal Editor tool to view the tTEXT chunk .

tEXT is at position 66 (in decimal) , from the beginning of file, and it is the same on both environment !!!

I used my own metadata reader function and the result is the same and valid for both windows 7 or windows 10 ( NO LOSS OF DATA).

The official site for PNG format is: https://www.w3.org/TR/PNG/

Conclusion

The function Image.FromStream is not suitable for reading metadata, the image file should be read in raw byte format not in image format, because the function FromStream reallocate the raw data in such away to keep the image and its data without distortion (that is the internals of the function in dotnet).

To read the metadata as described by PNG specs, you should read the stream in RAW BYTES from the beginning of the file as descriped by the specs.

I advice you to use the class library MetadataExtractor to read meta data, and its result is very accurate in both windows 7 and windows 10

You can install the library from nuget. install-Package MetadataExtractor

Edit 3: The Proposed Solution

Now the problem is resolved and the following class is valid for both win 7 , win 8

The major change is reading the image file as Raw bytes

class MetaReader 
{
    public static Hashtable GetData(string fname)
    {
        using (FileStream image = new FileStream(fname, FileMode.Open, FileAccess.Read))
        {
            Hashtable metadata = new Hashtable();
            byte[] imageBytes;

            using (var memoryStream = new MemoryStream())
            {
                image.CopyTo(memoryStream);
                imageBytes = memoryStream.ToArray();
                Console.WriteLine(imageBytes.Length);
            }

            if (imageBytes.Length <= 8)
            {
                return null;
            }

            // Skipping 8 bytes of PNG header
            int pointer = 8;

            while (pointer < imageBytes.Length)
            {
                // read the next chunk
                uint chunkSize = GetChunkSize(imageBytes, pointer);
                pointer += 4;
                string chunkName = GetChunkName(imageBytes, pointer);
                pointer += 4;

                // chunk data -----
                if (chunkName.Equals("tEXt"))
                {
                    byte[] data = new byte[chunkSize];
                    Array.Copy(imageBytes, pointer, data, 0, chunkSize);
                    StringBuilder stringBuilder = new StringBuilder();
                    foreach (byte t in data)
                    {
                        stringBuilder.Append((char)t);
                    }

                    string[] pair = stringBuilder.ToString().Split(new char[] { '\0' });
                    metadata[pair[0]] = pair[1];
                    Console.WriteLine(metadata[pair[0]]);
                }

                pointer += (int)chunkSize + 4;

                if (pointer > imageBytes.Length)
                    break;
            }
            return metadata;
        }
    }

    private static uint GetChunkSize(byte[] bytes, int pos)
    {
        byte[] quad = new byte[4];
        for (int i = 0; i < 4; i++) { quad[3 - i] = bytes[pos + i]; }

        return BitConverter.ToUInt32(quad, 0);

    }

    private static string GetChunkName(byte[] bytes, int pos)
    {
        StringBuilder builder = new StringBuilder(); for (int i = 0; i < 4; i++) { builder.Append((char)bytes[pos + i]); }

        return builder.ToString();

    }
}

Reading Metadata from a Web Service:

You can load image file from url as a stream, and read metadata on the fly. Besides, you can create instance of System.Drawing.Image and do what ever processing on the image. You can find a complete demo with source code at:

Reading Metadata from PNG loaded from Web Stream -TryIt

like image 177
M.Hassan Avatar answered Oct 04 '22 07:10

M.Hassan