Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Faster data structure for searching a string

I have this code that determines whether a word (ignoring case) is included in a wordList text file. However, the wordList text file may have 65000++ lines, and to just search a word using my implementation below takes nearly a minute. Could you think of any better implementation?

Thanks!

import java.io.*;
import java.util.*;

public class WordSearch 
{
    LinkedList<String> lxx;
    FileReader fxx;
    BufferedReader bxx;

    public WordSearch(String wordlist) 
        throws IOException
    {
        fxx = new FileReader(wordlist);
        bxx = new BufferedReader(fxx);
        lxx = new LinkedList<String>();
        String word;

        while ( (word = bxx.readLine()) != null) 
            {
            lxx.add(word);
        }

        bxx.close();
    }

    public boolean inTheList (String theWord)
    {
        for(int i =0 ; i < lxx.size(); i++)
            {
            if (theWord.compareToIgnoreCase(lxx.get(i)) == 0)
                    {
                return true;
            }
        }

        return false;
    }
}
like image 999
Mariska Avatar asked Mar 05 '11 02:03

Mariska


People also ask

Which data structure is used for fastest search?

In computer science, a search data structure is any data structure that allows the efficient retrieval of specific items from a set of items, such as a specific record from a database. The simplest, most general, and least efficient search structure is merely an unordered sequential list of all the items.

What kind of data structure can you use to quickly search for a given string?

Use a HashSet into which you put a lowercase version of each word. Checking if a HashSet contains a specified string is, on average, a constant-time (read: extremely fast) operation.

What is the best data structure for fast retrieval of data?

Hash Table. Hashtable is a list of paired values, the first item in the pair is the key, and the second item is the value. With a hash table, you can access objects by the key, so this structure is high-speed for lookups. Hash tables are faster than the arrays for lookups.

Which data structure is best for lookup?

What is the most efficient data structure for this? Looking at complexity analysis, Hashtables seem to be the most efficient for lookups.


2 Answers

Use a HashSet into which you put a lowercase version of each word. Checking if a HashSet contains a specified string is, on average, a constant-time (read: extremely fast) operation.

like image 179
Aasmund Eldhuset Avatar answered Sep 21 '22 15:09

Aasmund Eldhuset


Since you're searching, you may want to consider sorting the list before searching; then you can do binary search which is much faster than linear search. That can help if you'll perform multiple searches on the same list, otherwise the penalty you pay to sort the list isn't worth it for searching only once.

Also, doing linear search on a linked list using "lxx.get(i)" is asking for trouble. LinkedList.get() is O(n). You can either use an Iterator (easy way: for (String s : lxx)) or switch to a list type that has O(1) access time, such as ArrayList.

like image 37
vanza Avatar answered Sep 22 '22 15:09

vanza