Hashcode bucket distribution in java

Tags:

Suppose I need to store 1000 objects in Hashset, is it better that I have 1000 buckets containing each object( by generating unique value for hashcode for each object) or have 10 buckets roughly containing 100 objects?

1 advantage of having unique bucket is that I can save execution cycle on calling equals() method?

Why is it important to have set number of buckets and distribute the objects amoung them as evenly as possible?

What should be the ideal object to bucket ratio?

263

asked Jul 13 '12 10:07

Jyotirup

2 Answers

Why is it important to have set number of buckets and distribute the objects amoung them as evenly as possible?

A HashSet should be able to determine membership in O(1) time on average. From the documentation:

This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets.

The algorithm a Hashset uses to achieve this is to retrieve the hash code for the object and use this to find the correct bucket. Then it iterates over all the items in the bucket until it finds one that is equal. If the number of items in the bucket is greater than O(1) then lookup will take longer than O(1) time.

In the worst case - if all items hash to the same bucket - it will take O(n) time to determine if an object is in the set.

What should be the ideal object to bucket ratio?

There is a space-time tradeoff here. Increasing the number of buckets decreases the chance of collisions. However it also increases memory requirements. The hash set has two parameters initialCapacity and loadFactor that allow you to adjust how many buckets the HashSet should create. The default load factor is 0.75 and this is fine for most purposes, but if you have special requirements you can choose another value.

More information about these parameters can be found in the documentation for HashMap:

This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the capacity is roughly doubled by calling the rehash method.

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

198

answered Sep 18 '22 16:09

Mark Byers

Roughly one bucket per element is better for the processor, too many buckets is bad for the memory. Java will start with a small amount of buckets and automatically increase the capacity of your HashSet once it starts filling up, so you don't really need to care unless your application has issues performance and you've identified a hashset as the cause.

If you several elements in each bucket, lookups start taking longer. If you have lots of empty buckets, you're using more memory than you need and iterating over the elements takes longer.

This seems like a premature optimization waiting to happen though - the default constructor is fine in most cases.

answered Sep 19 '22 16:09

Jacob Raihle

Related questions
                            
                                SAXException2: A cycle is detected in the object graph. What is the case?
                            
                                How can I tell if a Joda DateTime is, say, between the hours of 4-8pm?
                            
                                I can't understand the reason behind ORA-01722: invalid number
                            
                                Joda Time - Get all weeks of a year
                            
                                Java Syntax for a list of comparable objects
                            
                                How to use FreeMarker to template nested Pojos?
                            
                                Calling method without object
                            
                                HttpClient 4.2, Basic Authentication, and AuthScope
                            
                                Will GC collect object a and b if they only have reference to each other?
                            
                                Which part of the Java Language Specification describes the behaviour of omitted varargs?
                            
                                How can I get email address from Google Plus API once i got the token
                            
                                Scala-style abstract modules in C# or other languages?
                            
                                How to convert java project to Maven project or similar
                            
                                Is there any good X12 parser in Java? [closed]
                            
                                ant scp failure
                            
                                Does Hibernate have to drive database design?
                            
                                JAVA String to char
                            
                                Is there a way to set height and wrap content at the same time in android?
                            
                                What's the lifecycle of a Java HttpSession object?
                            
                                Dragging a JLabel with a TransferHandler (Drag and Drop)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Hashcode bucket distribution in java

Tags:

java

collections

hash

bucket

Jyotirup

People also ask

2 Answers

Mark Byers

Jacob Raihle

Recent Activity

Donate For Us