Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Byte array to UTF8 string

Tags:

c#

I need to convert a byte array to a UTF8 string, and preserve the characters in the array.

Im uploading an image using multipart post. The image is sent along as a UTF8 string. I've compared the headers from my app and web browser and the data is the same, apart from one thing.

When it send along the browser, the content contains lots of [] characters, where as my app replaces [] with ?. Which means it's not preserving the characters as it should. Everything else is the same.

Heres the code I have atm

Byte[] fileOpen = File.ReadAllBytes("C:/pic.jpeg");
postData.AppendLine(System.Text.Encoding.UTF8.GetString(fileOpen));

Any advice?

like image 637
James Jeffery Avatar asked Feb 21 '10 11:02

James Jeffery


People also ask

How do you convert bytes to UTF-8?

In order to convert a String into UTF-8, we use the getBytes() method in Java. The getBytes() method encodes a String into a sequence of bytes and returns a byte array. where charsetName is the specific charset by which the String is encoded into an array of bytes.

How do you convert bytes to string?

Convert byte[] to String (text data) toString() to get the string from the bytes; The bytes. toString() only returns the address of the object in memory, NOT converting byte[] to a string ! The correct way to convert byte[] to string is new String(bytes, StandardCharsets.

Are Java strings UTF-8?

A Java String is internally always encoded in UTF-16 - but you really should think about it like this: an encoding is a way to translate between Strings and bytes.


3 Answers

The image is sent along as a UTF8 string.

Why? UTF-8 is a text encoding. Raw binary data should not be encoded but rather sent directly as bytes.

If your transfer protocol doesn't allow byte transfer, then the usual way is to encode the byes in Base64.

like image 120
Konrad Rudolph Avatar answered Sep 30 '22 00:09

Konrad Rudolph


Don't try to send the data using anything approaching a text API. You haven't said what postData is, but try to find some part of its API which deals with streams of binary data instead of text data. Looks for methods along the lines of AppendBytes, or GetStream to retrieve a stream you can write your data to.

Pretending that arbitrary binary data is text is a bad idea - you will lose data.

EDIT: One way which tends not to lose data (but is still a bad idea) is to treat binary data as an ISO-8859-1-encoded document. IIRC there is some debate about exactly what ISO-8859-1 contains in positions 128-159, but most encodings at least assume Unicode 128-159 as well.

Your "UTF-8 decoding" of the binary data may look like the correct data because for values 0-127, they're the same - it's only above that that you'll have problems. However, you should still avoid treating this binary data as text. It's not text, and treating it as text is simply a recipe for disaster.

If you could post the headers sent by your browser (including the headers of the part of the multipart that correspond to the image), we can hopefully help you slightly further - but the bottom line is that you should find a way of handing whatever API you're using (that would be useful information too) the raw binary data without going via text.

like image 34
Jon Skeet Avatar answered Sep 29 '22 23:09

Jon Skeet


To John and the other guys saying they don't believe me. I've solved it. Converting it to a string caused problems, but writting it directly to the request stream worked.

public string solveCaptcha(String username, String password)
    {
        String boundry = "---------------------------" + DateTime.Now.Ticks.ToString("x");

        StringBuilder postData = new StringBuilder();
        postData.AppendLine("--" + boundry);
        postData.AppendLine("Content-Disposition: form-data; name=\"function\"");
        postData.AppendLine("");
        postData.AppendLine("picture2");
        postData.AppendLine("--" + boundry);
        postData.AppendLine("Content-Disposition: form-data; name=\"username\"");
        postData.AppendLine("");
        postData.AppendLine(username);
        postData.AppendLine("--" + boundry);
        postData.AppendLine("Content-Disposition: form-data; name=\"password\"");
        postData.AppendLine("");
        postData.AppendLine(password);
        postData.AppendLine("--" + boundry);
        postData.AppendLine("Content-Disposition: form-data; name=\"pict\"; filename=\"pic.jpeg\"");
        postData.AppendLine("Content-Type: image/pjpeg");
        postData.AppendLine("");

        StringBuilder postData2 = new StringBuilder();
        postData2.AppendLine("\n--" + boundry);
        postData2.AppendLine("Content-Disposition: form-data; name=\"pict_to\"");
        postData2.AppendLine("");
        postData2.AppendLine("0");
        postData2.AppendLine("--" + boundry);
        postData2.AppendLine("Content-Disposition: form-data; name=\"pict_type\"");
        postData2.AppendLine("");
        postData2.AppendLine("0");
        postData2.AppendLine("--" + boundry + "--");

        Byte[] fileOpen = File.ReadAllBytes("C:/pic.jpeg");
        byte[] buffer = Encoding.ASCII.GetBytes(postData.ToString());
        byte[] buffer2 = Encoding.ASCII.GetBytes(postData2.ToString());

        HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://poster.decaptcher.com/");

        request.ContentType = "multipart/form-data; boundary=" + boundry;
        request.ContentLength = buffer.Length + buffer2.Length + fileOpen.Length;
        request.Method = "POST";

        String source = "";

        using (Stream PostData = request.GetRequestStream())
        {
            PostData.Write(buffer, 0, buffer.Length);
            PostData.Write(fileOpen, 0, fileOpen.Length);
            PostData.Write(buffer2, 0, buffer2.Length);

            using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
            {
                Byte[] rBuf = new Byte[8192];
                Stream resStream = response.GetResponseStream();
                string tmpString = null;
                int count = 0;
                do
                {
                    count = resStream.Read(rBuf, 0, rBuf.Length);
                    if (count != 0)
                    {
                        tmpString = Encoding.ASCII.GetString(rBuf, 0, count);
                        source += tmpString;
                    }
                } while (count > 0);

            }
        }
        MessageBox.Show(source);
        // Do something with the source
        return source;
    }

If you have a deCaptcher account, test it yourself. If need be I will post a video of it working, just to prove my point.

like image 32
James Jeffery Avatar answered Sep 30 '22 01:09

James Jeffery