Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

download a web page and save as UTF-8 text file

Tags:

c#

utf-8

I download a web page as follows. I want to save it as UTF-8 text. But how?

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
using (HttpWebResponse resp = (HttpWebResponse)req.GetResponse())
{
    Encoding enc = Encoding.GetEncoding(resp.CharacterSet);
    Encoding utf8 = Encoding.UTF8;
    using (StreamWriter w = new StreamWriter(new FileStream(pathname, FileMode.Create), utf8))
    {
        using (StreamReader r = new StreamReader(resp.GetResponseStream()))
        {
            // This works, but it's bad because you read the whole response into memory:
            string s = r.ReadToEnd();
            w.Write(s);

            // This doesn't work :(
            char[] buffer = new char[1024];
            int n;
            while (!r.EndOfStream)
            {
                n = r.ReadBlock(buffer, 0, 1024);
                w.Write(utf8.GetChars(Encoding.Convert(enc, utf8, enc.GetBytes(buffer))));
            }

            // This means that r.ReadToEnd() is doing the transcoding to UTF-8 differently.
            // But how?!
        }
    }
    return resp.StatusCode;
}

Don't read this paragraph. It's just here to make the warning message about having too much code go away.

like image 419
Richard Barraclough Avatar asked Dec 31 '25 01:12

Richard Barraclough


1 Answers

You could simply use the WebClient Class. It supports encoding and easier use:

WebClient webClient = new WebClient();
webClient.Encoding = System.Text.Encoding.UTF8;
webClient.DownloadFile(url, "file.txt");
like image 144
bytecode77 Avatar answered Jan 03 '26 12:01

bytecode77



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!