I'm trying to find the best way to compare two text documents using AI and machine learning methods. I've used the TF-IDF-Cosine Similarity and other similarity measures, but this compares the documents at a word (or n-gram) level.
I'm looking for a method that allows me to compare the meaning of the documents. What is the best way to do that?
Open one of the two versions of the document that you want to compare. On the Tools menu, point to Track Changes, and then click Compare Documents. In the Original document list, select the original document. In the Revised document list, browse to the other version of the document, and then click OK.
Using File Compare or the FC command in Command Prompt is another way if you need text or binary compare. The output is shown in Command Prompt and is not easy to read. For all file formats that Word can open, the Compare option in Word is the easiest to use.
Use the diff command to compare text files. It can compare single files or the contents of directories. When the diff command is run on regular files, and when it compares text files in different directories, the diff command tells which lines must be changed in the files so that they match.
You should start reading about word2vec model. use gensim, get the pretrained model of google. For vectoring a document, use Doc2vec() function. After getting vectors for all your document, use some distance metric like cosine distance or euclidean distance for comparison.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With