In PHP Curl case when we need to store/read cookies in term of web scraping, it feels that many resources out there encourage to use a file for handling cookies with these option
curl_setopt($ch, CURLOPT_COOKIEJAR, $CookieJarFilename);
curl_setopt($ch, CURLOPT_COOKIEFILE, $CookieJarFilename);
The bottom line here is they use a single file as cookiejar (usually .txt file).
But in the real scenario, our website is not only accessed by one computer, most likely there are many computers accessed it in the same time, and also there are some bots like Googlebots, Yahoo Slurp, etc.
So, with the single .txt file, isn't it obvious that the cookie jar will overwrite the same text file, make it a real mess for cookie?
Or am I mistaken here?
What's the 'right' method for handling cookies?
If there are multiple people accessing your page, and you need to perform curl
with unique cookies for everyone, then there are several things you can do to handle this scenario.
1) If your user is authenticated and has a $_SESSION
started on your end, then you can use the session_id()
for cookie's file name.
2) If your user doesn't require any session(a Google bot, for example), you can create the cookie using timestamp + an extra random number for your cookie file name. For example:
$cookieName = time()."_".substr(md5(microtime()),0,5).".txt";
// Would output something like:
// `1388788940_91ab4.txt`
But in this case, you can not reuse the cookie if the user returns back to you 5 minutes later(unless you set the user's cookie with your cookie file name).
For either case, make sure you are cleaning these files periodically. Otherwise you'll have tons of cookie files created in your directory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With