Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Variable renaming for plagiarism detection for C/C++

I have a couple of simple C++ homeworks and I know the students shared code. These are smart students and they know how to cheat moss. I'm looking for a tool that can rename variables based on their types (first variable of type int will be int1, first int array will be intptr1...), or does something similar that I cannot think of now. Do you know a quick way to do this?

edit: I'm required to use moss and report 90% match

Thanks

like image 250
perreal Avatar asked May 03 '11 22:05

perreal


People also ask

Is changing variable names enough to avoid plagiarism?

Plagiarism isn't about word-for-word copying, it's about the idea/gist/etc, and is distinct from copyright violation in that way. However, only changing variable names is probably copyright violation too if the original content doesn't allow copying without attribution.

Does Moss check variable names?

Moss does its comparisons on the IR, so variable names don't matter. If Moss doesn't catch them then renaming won't either.

How do you check for plagiarism in coding?

Detecting Plagiarism in Code To detect plagiarized code, the most popular tool is the MOSS system. (If you already know you want to use MOSS this quarter, skip to "Getting Started" below). Using MOSS involves packaging up students' solutions, submitting them for automated examination, and reviewing the results.

How does Hackerrank detect plagiarism?

Detecting Plagiarism We use Moss (Measure of Software Similarity) to detect plagiarism: Moss is an improved algorithm where it tokenizes the code. The tokenized versions of all candidates' source code are compared to identify pairs of documents with substantial overlap.


2 Answers

Yep, the tool you're looking for is called a compiler. :)

Seriously, if the programs submitted are exactly the same except for the identifier names, compiling then (without debugging info) should result in exactly the same output.

If you do this with debugging turned on, the compiler may leave meta-data in the executable that is different for each executable, hence the comment about ensuring it is off. This is also why this wont work for Java programs - that kind of info is present whether in debug mode or not (for the purposes of dynamic introspection).

EDIT: I see from the comments added to the question that you're observing some submissions that are different in more than just identifier names. If the programs are still structurally equivalent, this should still work.

EDIT: Given that the use of moss is a requirement, this probably isn't the way to go. I does seem though that moss has some support for comparing assembly - perhaps compiling to assembler and submitting that to moss is an option (depending on what compiler you're using).

like image 87
Mac Avatar answered Oct 06 '22 00:10

Mac


You can download and try our C CloneDR duplicate code detector. It finds duplicated code even when the variable names have been changed. Multiple changes in the same chunk are treated as just one; if they rename the varaibles consistenly everywhere, you'll get back a report of "one clone" with the precise variable subsitution.

like image 35
Ira Baxter Avatar answered Oct 06 '22 00:10

Ira Baxter