My problem is not usual. Let's imagine few billions of strings. Strings are usually less then 15 characters. In this list I need to find out the number of the unique elements.
First of all, what object should I use? You shouldn't forget if I add a new element I have to check if it is already existing in the list. It is not a problem in the beginning, but after few millions of words it can really slow down the process.
That's why I thought that Hashtable would be the ideal for this task because checking the list is ideally only log(1). Unfortunately a single object in .net can be only 2GB.
Next step will be to implement a custom hashtable which contains a list of 2GB hashtables.
I am wondering maybe some of you know a better solution. (Computer has extremely high specification.)
You can use the combination of the SUM and COUNTIF functions to count unique values in Excel. The syntax for this combined formula is = SUM(IF(1/COUNTIF(data, data)=1,1,0)). Here the COUNTIF formula counts the number of times each value in the range appears.
Method 1 : To check the status of visited elements create a array of size n and a variable say count_dis=0 to count the distinct element. Run a loop from index 0 to n and check if (visited[i]==1) then skip that element. Check if(arr[i]==arr[j]), then set visited[j]=1.
I would skip the data structures exercise and just use an SQL database. Why write another custom data structure that you have to analyze and debug, just use a database. They are really good at answering queries like this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With