Let's say I have two objects of a same class: objA and objB. Their relationship is the following:
(objA == objB) #true
(objA is objB) #false
If I use both objects as keys in a Python dict, then they will be considered as the same key, and overwrite each other. Is there a way to override the dict comparator to use the is
comparison instead of ==
so that the two objects will be seen as different keys in the dict?
Maybe I can override the equals method in the class or something? To be more specific, I am talking about two Tag objects from the BeautifulSoup4 library.
Here's a more specific example of what I am talking about:
from bs4 import BeautifulSoup
HTML_string = "<html><h1>some_header</h1><h1>some_header</h1></html>"
HTML_soup = BeautifulSoup(HTML_string, 'lxml')
first_h1 = HTML_soup.find_all('h1')[0] #first_h1 = <h1>some_header</h1>
second_h1 = HTML_soup.find_all('h1')[1] #second_h1 = <h1>some_header</h1>
print(first_h1 == second_h1) # this prints True
print(first_h1 is second_h1) # this prints False
my_dict = {}
my_dict[first_h1] = 1
my_dict[second_h1] = 1
print(len(my_dict)) # my dict has only 1 entry!
# I want to have 2 entries in my_dict: one for key 'first_h1', one for key 'second_h1'.
According to the python doc, you can indeed use the == operator on dictionaries.
For simple dictionaries, comparing them is usually straightforward. You can use the == operator, and it will work.
The compare method cmp() is used in Python to compare values and keys of two dictionaries. If method returns 0 if both dictionaries are equal, 1 if dic1 > dict2 and -1 if dict1 < dict2.
Dictionaries have some of the same operators and built-in functions that can be used with strings, lists, and tuples. For example, the in and not in operators return True or False according to whether the specified operand occurs as a key in the dictionary.
first_h1
and second_h1
are Tag
class instances. When you do my_dict[first_h1]
or my_dict[second_h1]
, string representations of the tags are used for hashing. The problem is, both of these Tag
instances have the same string representations:
<h1>some_header</h1>
This is because Tag
class have __hash__()
magic method defined as follows:
def __hash__(self):
return str(self).__hash__()
One of the workarounds could be to use the id()
values as hashes, but the there is a problem of redefining the Tag
classes inside BeautifulSoup
itself. You can workaround that problem by making your own custom "tag wrapper":
class TagWrapper:
def __init__(self, tag):
self.tag = tag
def __hash__(self):
return id(self.tag)
def __str__(self):
return str(self.tag)
def __repr__(self):
return str(self.tag)
Then, you'll be able to do:
In [1]: from bs4 import BeautifulSoup
...:
In [2]: class TagWrapper:
...: def __init__(self, tag):
...: self.tag = tag
...:
...: def __hash__(self):
...: return id(self.tag)
...:
...: def __str__(self):
...: return str(self.tag)
...:
...: def __repr__(self):
...: return str(self.tag)
...:
In [3]: HTML_string = "<html><h1>some_header</h1><h1>some_header</h1></html>"
...:
...: HTML_soup = BeautifulSoup(HTML_string, 'lxml')
...:
In [4]: first_h1 = HTML_soup.find_all('h1')[0] #first_h1 = <h1>some_header</h1>
...: second_h1 = HTML_soup.find_all('h1')[1] #second_h1 = <h1>some_header</h1>
...:
In [5]: my_dict = {}
...: my_dict[TagWrapper(first_h1)] = 1
...: my_dict[TagWrapper(second_h1)] = 1
...:
...: print(my_dict)
...:
{<h1>some_header</h1>: 1, <h1>some_header</h1>: 1}
It is, though, not pretty and not quite convenient to use. I would reiterate over your initial problem and check if you actually need to put tags into a dictionary.
You can also monkey-patch bs4
using Python's introspection powers, like it was done here, but this is going to be entering a rather dangerous territory.
It seems you want to override the operator ==
, you can choose the option of building a new class and implement the operator ==
:
def __eq__(self, obj) :
return (self is obj)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With