Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java : Datastructure to stock lots of words

I have to stock lots of word (+200k) in a Java program and I want to access them really fast. I just need to know if a given word belongs to my "dictionary". I don't need a pair like <word, smthg>. If possible I'm searching a solution in the standard library.

PS : Maybe using a data structure is not the better way to do this ? Reading each time the file containing the words will be more efficient ?

edit : It's a small project. I have to deal with effectiveness and the memory

Last Edit : I finally choose HashSet.

like image 955
DouglasAdams Avatar asked Apr 18 '13 10:04

DouglasAdams


People also ask

What is Datastructure in Java?

Data Structure in java is defined as the collection of data pieces that offers an effective means of storing and organising data in a computer. Linked List, Stack, Queue, and arrays are a few examples of java data structures.

Which is the best data structure for organizing and storing data?

Arrays. An array is the simplest and most widely used data structure. Other data structures like stacks and queues are derived from arrays.

Which data structure is used for fastest search?

In computer science, a search data structure is any data structure that allows the efficient retrieval of specific items from a set of items, such as a specific record from a database. The simplest, most general, and least efficient search structure is merely an unordered sequential list of all the items.


2 Answers

Use java Sets because sets are linear sorted data structure like TreeSet. So for searching, techniques like binary search can be implemented and they are fast with no repetition.

This is the structure of a java Sets.

enter image description here

Also it will not going to allow duplication hence reducing redundancy and will save your memory.

If you want to know various searching algorithms complexities refer this link. Here is

http://bigocheatsheet.com/

like image 71
Nikhil Agrawal Avatar answered Sep 28 '22 02:09

Nikhil Agrawal


Use either a Trie or Patricia tree depending on the distribution of the words. I would personally go with Patricia tree as it is more optimized for memory usage(though it is harder to implement).

like image 39
Ivaylo Strandjev Avatar answered Sep 28 '22 03:09

Ivaylo Strandjev