Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to match one input file with given numbers of file

I had an interview last week. I was stuck in one of the question in algorithm round. I answered that question, but the interviewer did not seem convinced. That's why I am sharing the same.

Please tell me any optimized method for this question, so that it will help me in future interviews.

Question :-

There are 20 text files given, all files are ASCII text files, having size less than 10^9 bytes. There is one input also given, this is also one ASCII file , say, input.txt.

Our task is to strategically match the content of this input file with given 20 files, and print the name of closest matching file. The contents of input file might only match partially

Thanks in advance. Looking for your kind reply.

like image 342
devsda Avatar asked Apr 04 '13 19:04

devsda


2 Answers

diff them and pass through wc -l, or implement Levenshtein distance in C++ treating each line as a single character (or any more appropriate unit condidering the subject domain)

like image 84
bobah Avatar answered Sep 29 '22 03:09

bobah


You can create some kind of indexing (example: trie) to summarize the input file. Then you can check how many indices match across documents.

Eg. Create a trie for input file for length 10. For every string of length 10 (overlapping) in the text files check how many of them match in the trie.

like image 38
ElKamina Avatar answered Sep 29 '22 04:09

ElKamina