Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to count words in java

Tags:

java

algorithm

I am looking for an algorithm, hint or any source code that can solve my following problem.

I have a folder it contains many text files. I read them and store all text in STRING. Now I want to to calculate, if any of the word appeared in other files or no. ( I know its not clear let me give an example)

For example i have two documents: Doc A => "brown fox jump" Doc B => "dog not jump" Doc C = > "fox jump dog"

Lets say my program read the first document and now first word is "brown" now my program will check if this word is also appeared in any other document? So the answer would be 0. Now it will check again for 2nd word "fox", it will give output that yes it appeared in (Doc C) so on...... Now it will read Doc B and it will check if dog appeared in other document? Answer would be (Doc C) so on....

Any advice or pseudo code?

Hint: It is also called inverse document frequency ( Idf ). I know what is idf.

like image 205
user238384 Avatar asked Dec 31 '09 01:12

user238384


People also ask

How do you count keywords?

To check word count, simply place your cursor into the text box above and start typing. You'll see the number of characters and words increase or decrease as you type, delete, and edit them. You can also copy and paste text from another program over into the online editor above.


2 Answers

Like GregS said, use HashMap. I'm not posting any code, because I think this is a homework and I want to give to you the opportunity to create it on your own, but the outline is:

  1. Open new document
  2. For every word, look at your hashmap if it's already there. If it isn't, create a new key in HashMap with this word, and in that position add the new document (the filename). If it is, just add the filename of the document.

For example, if you have: DocA: Brown fox jump DocB: Fox jump dog

You would open DocA and traverse its contents. 'brown' is not in your hashmap, so you would add a new element with key 'brown' and value 'DocA'. The same with 'fox' and 'jump'. Then you would open DocB. 'fox' is already in your hashmap, so you would add to its value DocB, (the value would be 'DocA DocB'). Maybe using an ArrayList (in Java) would help.

like image 85
Alex Ntousias Avatar answered Oct 17 '22 10:10

Alex Ntousias


Hint: HashMap mapping Strings to Lists of files.

like image 33
President James K. Polk Avatar answered Oct 17 '22 12:10

President James K. Polk