Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Downloading pdf file using WebRequests

I'm trying to download a number of pdf files automagically given a list of urls.

Here's the code I have:

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);

request.Method = "GET";

var encoding = new UTF8Encoding();

request.Headers.Add(HttpRequestHeader.AcceptLanguage, "en-gb,en;q=0.5");
request.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip, deflate");

request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0";

HttpWebResponse resp = (HttpWebResponse)request.GetResponse();

BinaryReader reader = new BinaryReader(resp.GetResponseStream());

FileStream stream = new FileStream("output/" + date.ToString("yyyy-MM-dd") + ".pdf",FileMode.Create);

BinaryWriter writer = new BinaryWriter(stream);

while (reader.PeekChar() != -1)
      {
       writer.Write(reader.Read());
      }
       writer.Flush();
       writer.Close();

So, I know the first part works. I was originally getting it and reading it using a TextReader - but that gave me corrupted pdf files (since pdfs are binary files).

Right now if I run it, reader.PeekChar() is always -1 and nothing happens - I get an empty file.

While debugging it, I noticed that reader.Read() was actually giving different numbers when I was invoking it - so maybe Peek is broken.

So I tried something very dirty

try
{
 while (true)
   {
    writer.Write(reader.Read());
    }
 }
   catch
      {
      }
 writer.Flush();
 writer.Close();

Now I'm getting a very tiny file with some garbage in it, but its still not what I'm looking for.

So, anyone can point me in the right direction?

Additional Information:

The header doesn't suggest its compressed or anything else.

HTTP/1.1 200 OK
Content-Type: application/pdf
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Fri, 10 Aug 2012 11:15:48 GMT
Content-Length: 109809
like image 521
Aabela Avatar asked Aug 10 '12 12:08

Aabela


3 Answers

Skip the BinaryReader and BinaryWriter and just copy the input stream to the output FileStream. Briefly

var fileName = "output/" + date.ToString("yyyy-MM-dd") + ".pdf";
using (var stream = File.Create(fileName))
  resp.GetResponseStream().CopyTo(stream);
like image 112
Martin Liversage Avatar answered Nov 12 '22 03:11

Martin Liversage


Why not use the WebClient class?

using (WebClient webClient = new WebClient())
{
    webClient.DownloadFile("url", "filePath");
}
like image 10
Sergey Vyacheslavovich Brunov Avatar answered Nov 12 '22 02:11

Sergey Vyacheslavovich Brunov


Your question asks about WebClient but your code shows you using Raw HTTP Requests & Resposnses.

Why don't you actually use the System.Net.WebClient ?

using(System.Net.WebClient wc = new WebClient()) 
{
    wc.DownloadFile("http://www.site.com/file.pdf",  "C:\\Temp\\File.pdf");
}
like image 2
Eoin Campbell Avatar answered Nov 12 '22 03:11

Eoin Campbell