I have a WebBrowser control on a form, but for the most part it remains hidden from the user. It is there to handle a series of login and other tasks. I have to use this control because there is a ton of Javascript that handles the login. (i.e., I can't just switch to a WebClient object.)
After hopping around a bit, we end up wanting to download a PDF file. But instead of downloading, the file is displayed within the webBrowser control, which the user can not see.
How can I download the PDF instead of having it load in the browser control?
The Web Browser control in VB.NET allows you to host Web pages and other web browser enabled documents in your Windows Forms applications. You can add browser control in your VB.Net projects and it displays the web pages like normal commercial web browsers .
Add a SaveFileDialog control to your form, then add the following code on your WebBrowser's Navigating event:
private void webBrowser1_Navigating(object sender, WebBrowserNavigatingEventArgs e)
{
if (e.Url.Segments[e.Url.Segments.Length - 1].EndsWith(".pdf"))
{
e.Cancel = true;
string filepath = null;
saveFileDialog1.FileName = e.Url.Segments[e.Url.Segments.Length - 1];
if (saveFileDialog1.ShowDialog() == DialogResult.OK)
{
filepath = saveFileDialog1.FileName;
WebClient client = new WebClient();
client.DownloadFileCompleted += new AsyncCompletedEventHandler(client_DownloadFileCompleted);
client.DownloadFileAsync(e.Url, filepath);
}
}
}
//Callback function
void client_DownloadFileCompleted(object sender, AsyncCompletedEventArgs e)
{
MessageBox.Show("File downloaded");
}
Source: http://social.msdn.microsoft.com/Forums/en-US/csharpgeneral/thread/d338a2c8-96df-4cb0-b8be-c5fbdd7c9202
The solution I ended up using:
I did everything else as-needed to get the URL where it needed to go. Knowing that all of the login information, required settings, viewstates, etc. were stored in the cookies, I was finally able to grab the file using a hybrid of the web control to navigate then the WebClient object to actually snag the file bytes.
public byte[] GetPDF(string keyValue)
{
DoLogin();
// Ask the source to generate the PDF. The PDF doesn't
// exist on the server until you have visited this page
// at least ONCE. The PDF exists for five minutes after
// the visit, so you have to snag it pretty quick.
LoadUrl(string.Format(
"https://www.theMagicSource.com/getimage.do?&key={0}&imageoutputformat=PDF",
keyValue));
// Now that we're logged in (not shown here), and
// (hopefully) at the right location, snag the cookies.
// We can use them to download the PDF directly.
string cookies = GetCookies();
byte[] fileBytes = null;
try
{
// We are fully logged in, and by now, the PDF should
// be generated. GO GET IT!
WebClient wc = new WebClient();
wc.Headers.Add("Cookie: " + cookies);
string tmpFile = Path.GetTempFileName();
wc.DownloadFile(string.Format(
"https://www.theMagicSource.com/document?id={0}_final.PDF",
keyValue), tmpFile);
fileBytes = File.ReadAllBytes(tmpFile);
File.Delete(tmpFile);
}
catch (Exception ex)
{
// If we can't get the PDF here, then just ignore the error and return null.
throw new WebScrapePDFException(
"Could not find the specified file.", ex);
}
return fileBytes;
}
private void LoadUrl(string url)
{
InternalBrowser.Navigate(url);
// Let the browser control do what it needs to do to start
// processing the page.
Thread.Sleep(100);
// If EITHER we can't continue OR
// the web browser has not been idle for 10 consecutive seconds yet,
// then wait some more.
// ...
// ... Some stuff here to make sure the page is fully loaded and ready.
// ... Removed to reduce complexity, but you get the idea.
// ...
}
private string GetCookies()
{
if (InternalBrowser.InvokeRequired)
{
return (string)InternalBrowser.Invoke(new Func<string>(() => GetCookies()));
}
else
{
return InternalBrowser.Document.Cookie;
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With