I download a web page as follows. I want to save it as UTF-8 text. But how?
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
using (HttpWebResponse resp = (HttpWebResponse)req.GetResponse())
{
Encoding enc = Encoding.GetEncoding(resp.CharacterSet);
Encoding utf8 = Encoding.UTF8;
using (StreamWriter w = new StreamWriter(new FileStream(pathname, FileMode.Create), utf8))
{
using (StreamReader r = new StreamReader(resp.GetResponseStream()))
{
// This works, but it's bad because you read the whole response into memory:
string s = r.ReadToEnd();
w.Write(s);
// This doesn't work :(
char[] buffer = new char[1024];
int n;
while (!r.EndOfStream)
{
n = r.ReadBlock(buffer, 0, 1024);
w.Write(utf8.GetChars(Encoding.Convert(enc, utf8, enc.GetBytes(buffer))));
}
// This means that r.ReadToEnd() is doing the transcoding to UTF-8 differently.
// But how?!
}
}
return resp.StatusCode;
}
Don't read this paragraph. It's just here to make the warning message about having too much code go away.
You could simply use the WebClient Class. It supports encoding and easier use:
WebClient webClient = new WebClient();
webClient.Encoding = System.Text.Encoding.UTF8;
webClient.DownloadFile(url, "file.txt");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With