I'm trying to access an authenticated site using a <code>cookies.txt</code> file (generated with a Chrome extension) with Python Requests: <pre class="prettyprint"><code>import requests, cookielib cj = cookielib.MozillaCookieJar('cookies.txt') cj.load() r = requests.get(url, cookies=cj) </code></pre> It doesn't throw any error or exception, but yields the login screen, incorrectly. However, I know that my cookie file is valid, because I can successfully retrieve my content using it with <code>wget</code>. Any idea what I'm doing wrong? Edit: I'm tracing <code>cookielib.MozillaCookieJar._really_load</code> and can verify that the cookies are correctly parsed (i.e. they have the correct values for the <code>domain</code>, <code>path</code>, <code>secure</code>, etc. tokens). But as the transaction is still resulting in the login form, it seems that <code>wget</code> must be doing something additional (as the exact same <code>cookies.txt</code> file works for it).

<code>MozillaCookieJar</code> inherits from <code>FileCookieJar</code> which has the following docstring in its constructor: <pre class="prettyprint"><code>Cookies are NOT loaded from the named file until either the .load() or .revert() method is called. </code></pre> You need to call <code>.load()</code> method then. Also, like Jermaine Xu noted the first line of the file needs to contain either <code># Netscape HTTP Cookie File</code> or <code># HTTP Cookie File</code> string. Files generated by the plugin you use do not contain such a string so you have to insert it yourself. I raised appropriate bug at http://code.google.com/p/cookie-txt-export/issues/detail?id=5 EDIT Session cookies are saved with 0 in the 5th column. If you don't pass <code>ignore_expires=True</code> to <code>load()</code> method all such cookies are discarded when loading from a file. File <code>session_cookie.txt</code>: <pre class="prettyprint"><code># Netscape HTTP Cookie File .domain.com TRUE / FALSE 0 name value </code></pre> Python script: <pre class="prettyprint"><code>import cookielib cj = cookielib.MozillaCookieJar('session_cookie.txt') cj.load() print len(cj) </code></pre> Output: <code>0</code> EDIT 2 Although we managed to get cookies into the jar above they are subsequently discarded by <code>cookielib</code> because they still have <code>0</code> value in the <code>expires</code> attribute. To prevent this we have to set the expire time to some future time like so: <pre class="prettyprint"><code>for cookie in cj: # set cookie expire date to 14 days from now cookie.expires = time.time() + 14 * 24 * 3600 </code></pre> EDIT 3 I checked both wget and curl and both use <code>0</code> expiry time to denote session cookies which means it's the de facto standard. However Python's implementation uses empty string for the same purpose hence the problem raised in the question. I think Python's behavior in this regard should be in line with what wget and curl do and that's why I raised the bug at http://bugs.python.org/issue17164 I'll note that replacing <code>0</code>s with empty strings in the 5th column of the input file and passing <code>ignore_discard=True</code> to <code>load()</code> is the alternate way of solving the problem (no need to change expiry time in this case).

Using cookies.txt file with Python Requests

Tags:

I'm trying to access an authenticated site using a cookies.txt file (generated with a Chrome extension) with Python Requests:

import requests, cookielib  cj = cookielib.MozillaCookieJar('cookies.txt') cj.load() r = requests.get(url, cookies=cj)

It doesn't throw any error or exception, but yields the login screen, incorrectly. However, I know that my cookie file is valid, because I can successfully retrieve my content using it with wget. Any idea what I'm doing wrong?

Edit:

I'm tracing cookielib.MozillaCookieJar._really_load and can verify that the cookies are correctly parsed (i.e. they have the correct values for the domain, path, secure, etc. tokens). But as the transaction is still resulting in the login form, it seems that wget must be doing something additional (as the exact same cookies.txt file works for it).

410

asked Feb 07 '13 03:02

cjauvin

Video Answer

1 Answers

MozillaCookieJar inherits from FileCookieJar which has the following docstring in its constructor:

Cookies are NOT loaded from the named file until either the .load() or .revert() method is called.

You need to call .load() method then.

Also, like Jermaine Xu noted the first line of the file needs to contain either # Netscape HTTP Cookie File or # HTTP Cookie File string. Files generated by the plugin you use do not contain such a string so you have to insert it yourself. I raised appropriate bug at http://code.google.com/p/cookie-txt-export/issues/detail?id=5

EDIT

Session cookies are saved with 0 in the 5th column. If you don't pass ignore_expires=True to load() method all such cookies are discarded when loading from a file.

File session_cookie.txt:

# Netscape HTTP Cookie File .domain.com TRUE    /   FALSE   0   name    value

Python script:

import cookielib  cj = cookielib.MozillaCookieJar('session_cookie.txt') cj.load() print len(cj)

Output: 0

EDIT 2

Although we managed to get cookies into the jar above they are subsequently discarded by cookielib because they still have 0 value in the expires attribute. To prevent this we have to set the expire time to some future time like so:

for cookie in cj:     # set cookie expire date to 14 days from now     cookie.expires = time.time() + 14 * 24 * 3600

EDIT 3

I checked both wget and curl and both use 0 expiry time to denote session cookies which means it's the de facto standard. However Python's implementation uses empty string for the same purpose hence the problem raised in the question. I think Python's behavior in this regard should be in line with what wget and curl do and that's why I raised the bug at http://bugs.python.org/issue17164
I'll note that replacing 0s with empty strings in the 5th column of the input file and passing ignore_discard=True to load() is the alternate way of solving the problem (no need to change expiry time in this case).

answered Nov 13 '22 02:11

Piotr Dobrogost

Related questions
                            
                                Count Unique values with a condition
                            
                                Can I send some text to the STDIN of an active process under Windows?
                            
                                d3.js force layout auto zoom/scale after loading
                            
                                Dynamic programming aspect in Kadane's algorithm
                            
                                What does "**" mean in ANT? [duplicate]
                            
                                How to add File [] Array content into ArrayList? [duplicate]
                            
                                Why doesn't "git status" display unpushed commits in a branch?
                            
                                What's the right way to find files by "full path" in Google Drive API v2
                            
                                Generate CREATE statements for all MySql Tables
                            
                                Passing arguments in anonymous functions in JavaScript
                            
                                Bash subshell: parentheses:() VS dollar-parentheses:$()
                            
                                Handlebars.js - Access object value with a variable key

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With