I am planning to make a Plagiarism Detector as my Computer Science Engineering final year project,for which I would like to take your suggestions on how to go about it.
I would appreciate if you could suggest which all fields in CS I need to focus on and also the language which would be the most appropriate to implement in.
The algorithms, normally, used in plagiarism detection software are string tiling, Karp-Rabin algorithm, Haeckel's algorithm, k-grams, string matching algorithm [11].
The way that plagiarism detection software works is to identify content similarity matches. That is, the software scans a database of crawled content and identifies the text components and then compares it to the components, or content, of other work.
The accuracy depends on the plagiarism checker you use. Per our in-depth research, Scribbr is the most accurate plagiarism checker. Many free plagiarism checkers fail to detect all plagiarism or falsely flag text as plagiarism.
The language is nearly irrelevant. Another questions exists that discusses this a bit more. Basically, the method suggested there is to use Google. Extract parts of the target-text, and search for them on Google.
I am making a plagiarism checker using Python as a hobby project. The following steps are to be followed:
Tokenize the document.
Remove all the stop words using NLTK library.
Use GenSim library and find the most relevant words, line by line. This can be done by creating the LDA or LSA of the document.
Use Google Search API to search for those words.
Note: you might have chosen to use the Google API and search the whole document at once. This will work when you are working with smaller amount of data. However when building plagiarism checker for sites and webscraped data, we will need to apply NLTK algorithms.
The Google search API will result in the top articles which have the same words which were resulted in the LDA or LSA from GenSim library functions of Python.
Hope it helped.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With