I have tried to write the following code, I am trying to write a code in Python 3.7
that just opens a web browser and the website fed to it in the Command Line
:
Example.py
import sys
from mechanize import Browser
browser = Browser()
browser.set_handle_equiv(True)
browser.set_handle_gzip(True)
browser.set_handle_redirect(True)
browser.set_handle_referer(True)
browser.set_handle_robots(False)
# pretend you are a real browser
browser.addheaders = [('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36')]
listOfSites = sys.argv[1:]
for i in listOfSites:
browser.open(i)
I have entered the following command in the cmd
:
python Example.py https://www.google.com
And I have the following traceback:
Traceback (most recent call last):
File "Example.py", line 19, in <module>
browser.open(i)
File "C:\Python37\lib\site-packages\mechanize\_mechanize.py", line 253, in open
return self._mech_open(url_or_request, data, timeout=timeout)
File "C:\Python37\lib\site-packages\mechanize\_mechanize.py", line 283, in _mech_open
response = UserAgentBase.open(self, request, data)
File "C:\Python37\lib\site-packages\mechanize\_opener.py", line 188, in open
req = meth(req)
File "C:\Python37\lib\site-packages\mechanize\_urllib2_fork.py", line 1104, in do_request_
for name, value in self.parent.addheaders:
ValueError: too many values to unpack (expected 2)
I am very new to Python
. This is my first code here. I am stuck with the above traceback but haven't found the solution yet. I have searched for a lot of questions on SO community as well but they didn't seem to help. What should I do next?
UPDATE:
As suggested by @Jean-François-Fabre, in his answer, I have added 'User-agent'
to the header, now there is no traceback, but still there is an issue where my link cannot be opened in the browser.
Here is how the addheader
looks like now:
browser.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36')]
The ValueError: too many values to unpack (expected 2)” occurs when you do not unpack all of the items in a list. A common mistake is trying to unpack too many values into variables. We can solve this by ensuring the number of variables equals the number of items in the list to unpack.
The “valueerror: too many values to unpack (expected 2)” error occurs when you do not unpack all the items in a list. This error is often caused by trying to iterate over the items in a dictionary. To solve this problem, use the items() method to iterate over a dictionary.
The valueerror: too many values to unpack occurs during a multiple-assignment where you either don't have enough objects to assign to the variables or you have more objects to assign than variables.
Verify the assignment variables. If the number of assignment variables is greater than the total number of variables, delete the excess variable from the assignment operator. The number of objects returned, as well as the number of variables available are the same. This will resolve the value error.
I have just found out a way around to this issue even if the above issue still exists. I am posting this only to let the readers know that we can do it this way too:
Instead of using the mechanize
package, we can use the webbrowser
package and write the following python code in the Example.py:
import webbrowser
import sys
#This is an upgrade suggested by @Jean-François Fabre
listOfSites = sys.argv[1:]
for i in listOfSites:
webbrowser.open_new_tab(i)
Then we can run this python code by executing the following command in the terminal/command prompt:
python Example.py https://www.google.com https://www.bing.com
This command mentioned above in the example will open two sites at a time. One is Google and the other is Bing
Here you go :)
import sys
from mechanize import Browser, Request
browser = Browser()
browser.set_handle_equiv(True)
browser.set_handle_gzip(True)
browser.set_handle_redirect(True)
browser.set_handle_referer(True)
browser.set_handle_robots(False)
# setup your header, add anything you want
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 5.1; rv:14.0) Gecko/20100101 Firefox/14.0.1', 'Referer': 'http://whateveritis.com'}
url_list = sys.argv[1:]
for url in url_list:
request = Request(url=url, data=None, headers=header)
response = browser.open(request)
print(response.read())
response.close()
I don't know mechanize
at all, but the traceback and variable names (and some googling) can help.
You're initializing addheaders
with a list of strings. Some other examples (ex: Mechanize Python and addheader method - how do I know the newest headers?) show a list of tuple
s, which seem to match the traceback. Ex:
browser.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
so it unpacks properly into name
and value
in loop
for name, value in whatever.addheaders:
You have to add the 'User-agent'
property name (you can pass other less common parameters than the browser name)
Let me try to answer your question in parts:
You're correct in adding "Browser Headers". Many servers might outright drop your connection as it's a definite sign of being crawled by a bot.
mechanize
as stated by the docs "is a tool for programmatic web browsing".
This means it is primarily used to crawl webpages, parse their contents, fill forms, click on things, send requests, but not using a "real" web browser, with parts such as CSS rendering. For example you cannot open a page and take a screenshot, as there's not something "rendered", and to achieve this you would need to save the page, and render it using another solution.
If this suits you needs, check headless browsers as a technology, there are a lot of them. In the Python ecosystem, other than mechanize
, I'd check "headless chromium", as "phantomjs" is unfortunately discontinued.
But if I understand correctly, you need the actual web browser to open up with the webpage, right? For this reason, you actually need, well, a browser in your system to take care of that!
Find out where your browser's executable lies in your system. For example, my Firefox executable lies in "C:\Program Files\Mozilla Firefox\firefox.exe"
and add it to your PATH.
As you're using Windows, use the start menu to navigate to Advanced System Settings --> Advanced --> Environment Variables
, and add the path above to your PATH
variable.
If you're using Linux export PATH=$PATH:"/path/to/your/browser"
will take care of things.
Then, your code can run as simply as
import subprocess
import sys
listOfSites = sys.argv[1:]
links = ""
for i in listOfSites:
links += "-new-tab " + i
print(links)
subprocess.run(["firefox", links])
Firefox will open new windows, one for each of the links you have provided.
Then comes Selenium, which in my opinion is the most mature solution to browser-related problems, and what most people use. I've used it in a production setting with very good results. It provides both the UI/frontend of a browser that renders the webpages, but also allows you to programmatically work with these webpages.
It needs some setup, (for example, if you're using Firefox, you'll need to download the geckodriver
executable from their releases page, and then add it your PATH
variable again.
Then you define your webdriver, spawn one for each of the websites you need to visit, and get
the webpage. You can also take screenshot, as a proof that the page has been rendered correctly.
from selenium import webdriver
import sys
listOfSites = sys.argv[1:]
for i in listOfSites:
driver = webdriver.Firefox()
driver.get('http://'+i)
driver.save_screenshot(i+'-screenshot.png')
# When you're finished
# driver.quit()
I've tested both of these code snippets, and they work as expected. Please let me know how all these sound, and if you need any more additional information..! ^^
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With