Description | A Java program to read a text file and print each of the unique words in alphabetical order together with the number of times the word occurs in the text. The program should declare a variable of type <code>Map<String, Integer></code> to store the words and corresponding frequency of occurrence. Which concrete type, though? <code>TreeMap<String, Number></code> or <code>HashMap<String, Number></code> ? The input should be converted to lower case. A word does not contain any of these characters: <code>\t\t\n]f.,!?:;\"()'</code> Example output | <pre class="prettyprint"><code> Word Frequency a 1 and 5 appearances 1 as 1 . . . </code></pre> Remark | I know, I've seen elegant solutions to this in Perl with roughly two lines of code. However, I want to see it in Java. Edit: Oh yeah, it be helpful to show an implementation using one of these structures (in Java).

TreeMap beats HashMap because TreeMap is already sorted for you. However, you might want to consider using a more appropriate data structure, a bag. See Commons Collections - and the TreeBag class: This has a nice optimised internal structure and API: <pre class="prettyprint"><code>bag.add("big") bag.add("small") bag.add("big") int count = bag.getCount("big") </code></pre> EDIT: The question of HashMap vs TreeMap performance was answered by Jon - HashMap and sort may be quicker (try it!), but TreeBag is easier. The same is true for bags. There is a HashBag as well as a TreeBag. Based on the implementation (uses a mutable integer) a bag should outperform the equivalent plain map of Integer. The only way to know for sure is to test, as with any performance question.

Which data structure would you use: TreeMap or HashMap? (Java) [duplicate]

Tags:

java

hashmap

data-structures

map

treemap

Description | A Java program to read a text file and print each of the unique words in alphabetical order together with the number of times the word occurs in the text.

The program should declare a variable of type Map<String, Integer> to store the words and corresponding frequency of occurrence. Which concrete type, though? TreeMap<String, Number> or HashMap<String, Number> ?

The input should be converted to lower case.

A word does not contain any of these characters: \t\t\n]f.,!?:;\"()'

Example output |

 Word            Frequency   a                 1   and               5   appearances       1   as                1          .          .          .

Remark | I know, I've seen elegant solutions to this in Perl with roughly two lines of code. However, I want to see it in Java.

Edit: Oh yeah, it be helpful to show an implementation using one of these structures (in Java).

478

asked Nov 19 '08 15:11

JohnZaj

2 Answers

TreeMap seems a no-brainer to me - simply because of the "in alphabetical order" requirement. HashMap has no ordering when you iterate through it; TreeMap iterates in the natural key order.

EDIT: I think Konrad's comment may have been suggesting "use HashMap, then sort." This is good because although we'll have N iterations initially, we'll have K <= N keys by the end due to duplicates. We might as well save the expensive bit (sorting) until the end when we've got fewer keys than take the small-but-non-constant hit of keeping it sorted as we go.

Having said that, I'm sticking to my answer for the moment: because it's the simplest way of achieving the goal. We don't really know that the OP is particularly worried about performance, but the question implies that he's concerned about the elegance and brevity. Using a TreeMap makes this incredibly brief, which appeals to me. I suspect that if performance is really an issue, there may be a better way of attacking it than either TreeMap or HashMap :)

178

answered Sep 24 '22 16:09

Jon Skeet

TreeMap beats HashMap because TreeMap is already sorted for you.

However, you might want to consider using a more appropriate data structure, a bag. See Commons Collections - and the TreeBag class:

This has a nice optimised internal structure and API:

bag.add("big") bag.add("small") bag.add("big") int count = bag.getCount("big")

EDIT: The question of HashMap vs TreeMap performance was answered by Jon - HashMap and sort may be quicker (try it!), but TreeBag is easier. The same is true for bags. There is a HashBag as well as a TreeBag. Based on the implementation (uses a mutable integer) a bag should outperform the equivalent plain map of Integer. The only way to know for sure is to test, as with any performance question.

answered Sep 24 '22 16:09

JodaStephen

Related questions
                            
                                How much memory do Enums take?
                            
                                .m2 , settings.xml in Ubuntu [duplicate]
                            
                                java.lang.ClassNotFoundException: org.springframework.boot.SpringApplication Maven
                            
                                The correct way to return the only element from a set
                            
                                Java Wait and Notify: IllegalMonitorStateException
                            
                                Is there a way to refer to the current type with a type variable?
                            
                                JAVA - Best approach to parse huge (extra large) JSON file
                            
                                How to clone a JPA entity
                            
                                Why are floating point infinities, unlike NaNs, equal?
                            
                                Difference between system.gc() and runtime.gc()
                            
                                Maven version with a property
                            
                                Redefinition failed with error 62 while trying to profile an application
                            
                                Do we really need @Override and so on when code Java? [duplicate]
                            
                                round a floating-point number to the next integer value in java
                            
                                How can I get the memory that my Java program uses via Java's Runtime api?
                            
                                One-To-Many relationship gets duplicate objects without using "distinct". Why?
                            
                                What's the difference between -DskipTests and -Dmaven.test.skip=true
                            
                                How do I add a resources folder to my Java project in Eclipse
                            
                                Stream Filter of 1 list based on another list
                            
                                Is there a way to run a method/class only on Tomcat/Wildfly/Glassfish startup?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With