Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate PDF Behind Authentication Wall

I'm trying to generate a PDF using WKHTMLTOPDF that requires me to first log in. There's some on this on the internet already but I can't seem to get mine working. I'm in Terminal - nothing fancy.

I've tried (among a whole lot of other stuff):

/usr/bin/wkhtmltopdf --post username=myusername --post password=mypassword "URL to Generate" test.pdf

/usr/bin/wkhtmltopdf --username myusername --password mypassword "URL to Generate" test.pdf

/usr/bin/wkhtmltopdf --cookie-jar my.jar --post username=myusername --post password=mypassword "URL to Generate Cookie For"

username and password are both the id and the name of the input fields on the form. I am getting the my.jar file to show up, but nothing is written to it.

Specific questions:

  1. Should I be specifying the login page and/or form action anywhere?
  2. the --cookie-jar parameter has been mentioned in various places (both as being needed and otherwise). Should that be necessary, how does it work? I've created the my.jar file but how do I use it again? Referencing:

http://code.google.com/p/wkhtmltopdf/issues/detail?id=356


EDIT:

Surely someone has done this successfully? A good way to showcase an example might if someone is willing to get it to work on some popular website that requires login credentials to eliminate a potential variable.

like image 977
Chords Avatar asked Apr 23 '12 19:04

Chords


2 Answers

Every login form will be different for every site. What you're going to want to do is determine what all you need to pass in to that login form's target by reading the HTML on the page (which you're probably aware of). It may take an additional hidden field on top of the username/password fields to prevent cross site request forgeries.

The cookie jar parameter is a file that it stores the cookies it gets back from the webserver in. You need to specify it in the first request to the login form, and in subsequent requests to continue to use the cookie/session information that the webserver will have given you back after logging in.

So to sum it up:

  1. Look and see if there are any additional parameters on the page required.
  2. Make sure the URL you are submitting to is the same as the ACTION attribute of the form element on that page.
  3. Use the --cookie-jar parameter in both the login request and the second content request.
  4. The syntax for the --post parameters are --post username user_name_value --post password password_value
like image 178
hsanders Avatar answered Nov 09 '22 04:11

hsanders


I think the form I'm trying to log in to is too complex. It's secure, sets three cookies, redirects twice, and posts a number of other variables outside of the username and password, one of which requires a cookie value (I even tried concatenating the value into the post variable, but no luck). This is probably a pretty rare issue - by no means the fault of WKHTMLTOPDF.

I wound up using CURL to log in and write the page to a local file, then ran WKHTMLTOPDF against that. Definitely a solid work around for anyone else having a similar issue.


Edit: CURL, if interested:

curl_setopt($ch, CURLOPT_HEADER, 1); # Change to 1 to see WTF
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_URL, $loginUrl);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postFields);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
like image 9
Chords Avatar answered Nov 09 '22 04:11

Chords