How to count words in java

Tags:

algorithm

I am looking for an algorithm, hint or any source code that can solve my following problem.

I have a folder it contains many text files. I read them and store all text in STRING. Now I want to to calculate, if any of the word appeared in other files or no. ( I know its not clear let me give an example)

For example i have two documents: Doc A => "brown fox jump" Doc B => "dog not jump" Doc C = > "fox jump dog"

Lets say my program read the first document and now first word is "brown" now my program will check if this word is also appeared in any other document? So the answer would be 0. Now it will check again for 2nd word "fox", it will give output that yes it appeared in (Doc C) so on...... Now it will read Doc B and it will check if dog appeared in other document? Answer would be (Doc C) so on....

Any advice or pseudo code?

Hint: It is also called inverse document frequency ( Idf ). I know what is idf.

205

asked Dec 31 '09 01:12

user238384

2 Answers

Like GregS said, use HashMap. I'm not posting any code, because I think this is a homework and I want to give to you the opportunity to create it on your own, but the outline is:

Open new document
For every word, look at your hashmap if it's already there. If it isn't, create a new key in HashMap with this word, and in that position add the new document (the filename). If it is, just add the filename of the document.

For example, if you have: DocA: Brown fox jump DocB: Fox jump dog

You would open DocA and traverse its contents. 'brown' is not in your hashmap, so you would add a new element with key 'brown' and value 'DocA'. The same with 'fox' and 'jump'. Then you would open DocB. 'fox' is already in your hashmap, so you would add to its value DocB, (the value would be 'DocA DocB'). Maybe using an ArrayList (in Java) would help.

answered Oct 17 '22 10:10

Alex Ntousias

Hint: HashMap mapping Strings to Lists of files.

answered Oct 17 '22 12:10

President James K. Polk

Related questions
                            
                                Searching for Generic Asynchronous Java Job Execution Framework / Library
                            
                                How to check if an element in array exists in Java
                            
                                Java: synchronizing threads across multiple servers
                            
                                Writing functional programs in non-functional languages
                            
                                How to build a java web application
                            
                                NoClassDefFoundError while accessing GraphicsEnvironment.getLocalGraphicsEnvironment on Tomcat
                            
                                why does Spring use XML for component wiring?
                            
                                Expression Value in Jasper Report: "Cannot cast from String to Boolean" error
                            
                                Java: Use ObjectOutputStream without serializable
                            
                                JTextPane prevents scrolling in the parent JScrollPane
                            
                                Why is JMS not included by default in JDK?
                            
                                Can Java Annotations help me with this?
                            
                                Abstract Generic class
                            
                                What are the best garbage collection settings for client side?
                            
                                How to remove a row from a 2d array?
                            
                                TimeZone problem in Java
                            
                                How do I fix this UnknownHostException?
                            
                                How to set/get profile data with XMPP using Smack
                            
                                How to quickly populate a Java object with data from another, unrelated object?
                            
                                How do I initialize a Graphics object in Java?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With