Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Build query string using urlencode python

I am trying to build a url so that I can send a get request to it using urllib module.

Let's suppose my final_url should be

url = "www.example.com/find.php?data=http%3A%2F%2Fwww.stackoverflow.com&search=Generate+value"

Now to achieve this I tried the following way:

>>> initial_url = "http://www.stackoverflow.com"
>>> search = "Generate+value"
>>> params = {"data":initial_url,"search":search}
>>> query_string = urllib.urlencode(params)
>>> query_string
'search=Generate%2Bvalue&data=http%3A%2F%2Fwww.stackoverflow.com'

Now if you compare my query_string with the format of final_url you can observer two things

1) The order of params are reversed instead of data=()&search= it is search=()&data=

2) urlencode also encoded the + in Generate+value

I believe the first change is due to the random behaviour of dictionary. So, I though of using OrderedDict to reverse the dictionary. As, I am using python 2.6.5 I did

pip install ordereddict

But I am not able to use it in my code when I try

>>> od = OrderedDict((('a', 'first'), ('b', 'second')))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'OrderedDict' is not defined

So, my question is what is the correct way to use OrderedDict in python 2.6.5 and how do I make urlencode ignores the + in Generate+value.

Also, is this the correct approach to build URL.

like image 699
RanRag Avatar asked May 26 '12 11:05

RanRag


People also ask

How do I use python Urlencode?

In Python 3+, You can URL encode any string using the quote() function provided by urllib. parse package. The quote() function by default uses UTF-8 encoding scheme.

How do I add query parameters to a URL in Python?

The results of urlparse() and urlsplit() are actually namedtuple instances. Thus you can assign them directly to a variable and use url_parts = url_parts. _replace(query = …) to update it.

How can I get query string in Django?

Django processes the query string automatically and makes its parameter/value pairs available to the view. No configuration required. The URL pattern refers to the base URL only, the query string is implicit. For normal Django style, however, you should put the id/slug in the base URL and not in the query string!

What is %20 in query string?

URLs are encoded as RFC 1738 which specifies %20 . Show activity on this post. According to the W3C (and they are the official source on these things), a space character in the query string (and in the query string only) may be encoded as either " %20 " or " + ".


3 Answers

You shouldn't worry about encoding the + it should be restored on the server after unescaping the url. The order of named parameters shouldn't matter either.

Considering OrderedDict, it is not Python's built in. You should import it from collections:

from urllib import urlencode, quote
# from urllib.parse import urlencode # python3
from collections import OrderedDict

initial_url = "http://www.stackoverflow.com"
search = "Generate+value"
query_string = urlencode(OrderedDict(data=initial_url,search=search))
url = 'www.example.com/find.php?' + query_string 

if your python is too old and does not have OrderedDict in the module collections, use:

encoded = "&".join( "%s=%s" % (key, quote(parameters[key], safe="+")) 
    for key in ordered(parameters.keys()))

Anyway, the order of parameters should not matter.

Note the safe parameter of quote. It prevents + to be escaped, but it means , server will interpret Generate+value as Generate value. You can manually escape + by writing %2Band marking % as safe char:

like image 116
Aleš Kotnik Avatar answered Oct 18 '22 22:10

Aleš Kotnik


First, the order of parameters in a http request should be completely irrelevant. If it isn't then the parsing library on the othe side is doing something wrong.

Second, of course the + is encoded. + is used as placeholder for a space in an encoded url, so if yor raw string contains a +, this has to be escaped. urlencode expects an unencoded string, you can't pass it a string that is already encoded.

like image 3
mata Avatar answered Oct 18 '22 23:10

mata


Some comments on the question and other answers:

  1. If you want to preserve order with urllib.urlencode, submit an ordered sequence of k/v pairs instead of mapping(dict). when you pass in a dict, urlencode just calls foo.items() to grab an iterable sequence.

# urllib.urlencode accepts a mapping or sequence # the output of this can vary, because `items()` is called on the dict urllib.urlencode({"data": initial_url,"search": search}) # the output of this will not vary urllib.urlencode((("data", initial_url), ("search", search)))

you can also pass in a secondard doseq argument to adjust how iterable values are handled.

  1. The order of parameters is not irrelevant. take these two urls for example:

    https://example.com?foo=bar&bar=foo https://example.com?bar=foo&foo=bar

    A http server should consider the order of these parameters irrelevant, but a function designed to compare URLs would not. In order to safely compare urls, these params would need to be sorted.

    However, consider duplicate keys:

    https://example.com?foo=3&foo=2&foo=1

The URI specs support duplicate keys, but don't address precedence or ordering.

In a given application, these could each trigger different results and be valid as well:

https://example.com?foo=1&foo=2&foo=3
https://example.com?foo=1&foo=3&foo=2
https://example.com?foo=2&foo=3&foo=1
https://example.com?foo=2&foo=1&foo=3
https://example.com?foo=3&foo=1&foo=2
https://example.com?foo=3&foo=2&foo=1
  1. The + is a reserved character that represents a space in a urlencoded form (vs %20 for part of the path). urllib.urlencode escapes using urllib.quote_plus(), not urllib.quote(). The OP most likely wanted to just do this:

initial_url = "http://www.stackoverflow.com" search = "Generate value" urllib.urlencode((("data", initial_url), ("search", search)))

Which produces:

data=http%3A%2F%2Fwww.stackoverflow.com&search=Generate+value

as the output.

like image 2
Jonathan Vanasco Avatar answered Oct 18 '22 23:10

Jonathan Vanasco