I've implemented a BloomFilter in python 3.3, and got different results every session. Drilling down this weird behavior got me to the internal hash() function - it returns different hash values for the same string every session. Example: <pre class="prettyprint"><code>>>> hash("235") -310569535015251310 </code></pre> ----- opening a new python console ----- <pre class="prettyprint"><code>>>> hash("235") -1900164331622581997 </code></pre> Why is this happening? Why is this useful?

Python uses a random hash seed to prevent attackers from tar-pitting your application by sending you keys designed to collide. See the original vulnerability disclosure. By offsetting the hash with a random seed (set once at startup) attackers can no longer predict what keys will collide. You can set a fixed seed or disable the feature by setting the <code>PYTHONHASHSEED</code> environment variable; the default is <code>random</code> but you can set it to a fixed positive integer value, with <code>0</code> disabling the feature altogether. Python versions 2.7 and 3.2 have the feature disabled by default (use the <code>-R</code> switch or set <code>PYTHONHASHSEED=random</code> to enable it); it is enabled by default in Python 3.3 and up. If you were relying on the order of keys in a Python set, then don't. Python uses a hash table to implement these types and their order depends on the insertion and deletion history as well as the random hash seed. Note that in Python 3.5 and older, this applies to dictionaries, too. Also see the <code>object.__hash__()</code> special method documentation: <blockquote> Note: By default, the <code>__hash__()</code> values of str, bytes and datetime objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python. This is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict insertion, O(n^2) complexity. See http://www.ocert.org/advisories/ocert-2011-003.html for details. Changing hash values affects the iteration order of dicts, sets and other mappings. Python has never made guarantees about this ordering (and it typically varies between 32-bit and 64-bit builds). See also <code>PYTHONHASHSEED</code>. </blockquote> If you need a stable hash implementation, you probably want to look at the <code>hashlib</code> module; this implements cryptographic hash functions. The pybloom project uses this approach. Since the offset consists of a prefix and a suffix (start value and final XORed value, respectively) you cannot just store the offset, unfortunately. On the plus side, this does mean that attackers cannot easily determine the offset with timing attacks either.

hash function in Python 3.3 returns different results between sessions

Tags:

python

security

hash

hash-collision

python-3.3

I've implemented a BloomFilter in python 3.3, and got different results every session. Drilling down this weird behavior got me to the internal hash() function - it returns different hash values for the same string every session.

Example:

Click to copy

>>> hash("235")
-310569535015251310

----- opening a new python console -----

Click to copy

>>> hash("235")
-1900164331622581997

Why is this happening? Why is this useful?

361

asked Sep 25 '22 14:09

redlus

2 Answers

Python uses a random hash seed to prevent attackers from tar-pitting your application by sending you keys designed to collide. See the original vulnerability disclosure. By offsetting the hash with a random seed (set once at startup) attackers can no longer predict what keys will collide.

You can set a fixed seed or disable the feature by setting the PYTHONHASHSEED environment variable; the default is random but you can set it to a fixed positive integer value, with 0 disabling the feature altogether.

Python versions 2.7 and 3.2 have the feature disabled by default (use the -R switch or set PYTHONHASHSEED=random to enable it); it is enabled by default in Python 3.3 and up.

If you were relying on the order of keys in a Python set, then don't. Python uses a hash table to implement these types and their order depends on the insertion and deletion history as well as the random hash seed. Note that in Python 3.5 and older, this applies to dictionaries, too.

Also see the object.__hash__() special method documentation:

Note: By default, the __hash__() values of str, bytes and datetime objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.

This is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict insertion, O(n^2) complexity. See http://www.ocert.org/advisories/ocert-2011-003.html for details.

Changing hash values affects the iteration order of dicts, sets and other mappings. Python has never made guarantees about this ordering (and it typically varies between 32-bit and 64-bit builds).

See also PYTHONHASHSEED.

If you need a stable hash implementation, you probably want to look at the hashlib module; this implements cryptographic hash functions. The pybloom project uses this approach.

Since the offset consists of a prefix and a suffix (start value and final XORed value, respectively) you cannot just store the offset, unfortunately. On the plus side, this does mean that attackers cannot easily determine the offset with timing attacks either.

207

answered Oct 20 '22 04:10

Martijn Pieters

Hash randomisation is turned on by default in Python 3. This is a security feature:

Hash randomization is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict construction

In previous versions from 2.6.8, you could switch it on at the command line with -R, or the PYTHONHASHSEED environment option.

You can switch it off by setting PYTHONHASHSEED to zero.

answered Oct 20 '22 03:10

Peter Wood

Related questions
                            
                                Make sure only a single instance of a program is running
                            
                                Explicitly select items from a list or tuple
                            
                                __getattr__ on a module
                            
                                How to exit pdb and allow program to continue?
                            
                                In Python script, how do I set PYTHONPATH?
                            
                                Python if-else short-hand [duplicate]
                            
                                How do you fix "runtimeError: package fails to pass a sanity check" for numpy and pandas?
                            
                                How Big can a Python List Get?
                            
                                BeautifulSoup Grab Visible Webpage Text
                            
                                Flask-SQLAlchemy import/context issue
                            
                                Printing tuple with string formatting in Python
                            
                                What are WSGI and CGI in plain English?
                            
                                How can I make pandas dataframe column headers all lowercase?
                            
                                SQLAlchemy ORM conversion to pandas DataFrame
                            
                                pip: no module named _internal
                            
                                how to convert an RGB image to numpy array?
                            
                                How to create a density plot in matplotlib?
                            
                                Datetime current year and month in Python
                            
                                pip install failing with: OSError: [Errno 13] Permission denied on directory
                            
                                Confusion between numpy, scipy, matplotlib and pylab

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With