Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to compare meaning of text documents?

I'm trying to find the best way to compare two text documents using AI and machine learning methods. I've used the TF-IDF-Cosine Similarity and other similarity measures, but this compares the documents at a word (or n-gram) level.

I'm looking for a method that allows me to compare the meaning of the documents. What is the best way to do that?

like image 228
Moon_Watcher Avatar asked Mar 13 '18 12:03

Moon_Watcher


People also ask

How do I compare text documents?

Open one of the two versions of the document that you want to compare. On the Tools menu, point to Track Changes, and then click Compare Documents. In the Original document list, select the original document. In the Revised document list, browse to the other version of the document, and then click OK.

How do I compare data in two text files?

Using File Compare or the FC command in Command Prompt is another way if you need text or binary compare. The output is shown in Command Prompt and is not easy to read. For all file formats that Word can open, the Compare option in Word is the easiest to use.

What is the best way to compare two files?

Use the diff command to compare text files. It can compare single files or the contents of directories. When the diff command is run on regular files, and when it compares text files in different directories, the diff command tells which lines must be changed in the files so that they match.


1 Answers

You should start reading about word2vec model. use gensim, get the pretrained model of google. For vectoring a document, use Doc2vec() function. After getting vectors for all your document, use some distance metric like cosine distance or euclidean distance for comparison.

like image 133
durjoy Avatar answered Sep 28 '22 22:09

durjoy