Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient algorithm to check duplicate rows in a matrix

Given a matrix M of integers. Check if two rows are identical in the matrix. Give an optimum approach.

Example:
[{1, 2, 3},
 {3, 4, 5},
 {1, 2, 3}]

In the above matrix, rows 1 and 3 are identical.

Possible Solution:

Given a matrix, we can convert each row in a string (example using to_string()
method of C++ and concatenating each element in a row to a string). We do this
for every row of the matrix, and insert it in a table that is something like
(map<string, int> in C++). And hence, duplicate row can be checked in O(mn) time
for an mxn matrix.

Can I do better than this ? Or, above method has any flaw ?

like image 953
Rahul Sharma Avatar asked Oct 17 '13 23:10

Rahul Sharma


People also ask

How do I remove duplicate rows from a matrix?

Approach : Traverse the entire matrix and for every element, check its corresponding row and column. If there are duplicates, then mark that particular cell in another matrix as true. Finally, all the characters for which there is false value in another matrix are joined to form the string.

Can we print unique rows in a table write syntax use suitable example?

Approach: A simple approach would be to check each row with all processed rows. Print the first row. Now, starting from the second row, for each row, compare the row with already processed rows. If the row matches with any of the processed rows, skip it else print it.


1 Answers

Your method works but you are wrong with the complexity of it.

Firstly, testing if an element is in a std::map has complexity O(log(n) * f), where n is the number of elements in the map and f is an upper bound for the time required to comparing any two elements inserted/searched in the map.

In your case, every string has a length m, so comparing any two elements in the map costs O(m).

So the total complexity of your method is:

O(n * log(n) * m) for inserting n strings in the map.

However, you can speed it up to O(n * m) in expectation, which is asymptotically optimal (because you have to read all the data), using a hash table rather than a map. The reason for this is that a hash table has O(1) average complexity for an insert operation and the hash function for every input string is computed only once.

In C++ you can use the unordered_set for that.

like image 190
pkacprzak Avatar answered Nov 11 '22 22:11

pkacprzak