How to search for multiple strings in a text file

Tags:

i am working in text files. I want to implement a search algorithm in Java. I have a text files i need to search.

If I want to find one word I can do it by just putting all the text into the hashmap and store each word's occurrence. But is there any algorithm if i want to search for two strings (or may be more)? Should i hash the strings in pair of two ?

554

asked Oct 04 '11 12:10

Arjit

1 Answers

It depends a lot on the size of the text file. There are usually several cases you should consider:

Lot's of queries on very short documents (web pages, texts of essay length etc). Text distribution like normal language. A simple O(n^2) algorithm is fine. For a query of length n just take a window of length n and slide it over. Compare and move the window until you find a match. This algorithm does not care about words, so you just see the whole search as a big string (including spaces). This is probably what most browsers does. KMP or Boyer Moore is not worth the effort, since the O(n^2) case is very rare.
Lot's of queries on one large document. Preprocess your document and store it preprocessed. Common storage options are suffix trees and inverted lists. If you have multiple documents you can build one document from when by concatenating them and storing the end of documents seperately. This is the way to go for document databases where the collection is almost constant.
If you have several documents where you have a high redundancy and your collections changes often, use KMP or Boyer Moore. For example if you want to find certain sequences in DNA data and you often get new sequences to find as well new DNA from experiments, the O(n^2) part of the naive algorithm would kill your time.

There are probably lot's of more possibilities that need different algorithms and data structures, so you should figure out which one is the best in your case.

146

answered Nov 01 '22 00:11

LiKao

Related questions
                            
                                How to parse a DocumentFragment with with the Java standard DOM API
                            
                                Open source java library to read ECG data? [closed]
                            
                                Where can I find the native implementations of these functions?
                            
                                Change colors for JProgressBar with Nimbus?
                            
                                Purpose of having abstract child by extending concrete parent
                            
                                Recursion using AspectJ
                            
                                Call tree of a method: how coverage tools work in Java? Is there an API?
                            
                                java netcdf 4 tutorial
                            
                                Adding/modifying annotations in a java project
                            
                                Java bitmap font: blitting 1-bit image with different colors
                            
                                Multiple Event Dispatch Threads
                            
                                Add header via Google Spreadsheet API
                            
                                Hector (Cassandra) Delete Anomaly
                            
                                NoClassDefFoundError: javax/xml/stream/XMLStreamException
                            
                                Mbeans registered to mbean server not showing up in jconsole
                            
                                Get original text of an Antlr rule
                            
                                ACM Programming Question
                            
                                log4j:ERROR Error occured while converting date
                            
                                SimpleDateFormat localized month names
                            
                                Cannot remotely debug JVM via SSH tunnel

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to search for multiple strings in a text file

Tags:

java

string

algorithm

Arjit

People also ask

1 Answers

LiKao

Recent Activity

Donate For Us