I am looking for an algorithm, hint or any source code that can solve my following problem.
I have a folder it contains many text files. I read them and store all text in STRING. Now I want to to calculate, if any of the word appeared in other files or no. ( I know its not clear let me give an example)
For example i have two documents: Doc A => "brown fox jump" Doc B => "dog not jump" Doc C = > "fox jump dog"
Lets say my program read the first document and now first word is "brown" now my program will check if this word is also appeared in any other document? So the answer would be 0. Now it will check again for 2nd word "fox", it will give output that yes it appeared in (Doc C) so on...... Now it will read Doc B and it will check if dog appeared in other document? Answer would be (Doc C) so on....
Any advice or pseudo code?
Hint: It is also called inverse document frequency ( Idf ). I know what is idf.
To check word count, simply place your cursor into the text box above and start typing. You'll see the number of characters and words increase or decrease as you type, delete, and edit them. You can also copy and paste text from another program over into the online editor above.
Like GregS said, use HashMap. I'm not posting any code, because I think this is a homework and I want to give to you the opportunity to create it on your own, but the outline is:
For example, if you have: DocA: Brown fox jump DocB: Fox jump dog
You would open DocA and traverse its contents. 'brown' is not in your hashmap, so you would add a new element with key 'brown' and value 'DocA'. The same with 'fox' and 'jump'. Then you would open DocB. 'fox' is already in your hashmap, so you would add to its value DocB, (the value would be 'DocA DocB'). Maybe using an ArrayList (in Java) would help.
Hint: HashMap mapping Strings to Lists of files.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With