Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Searching for copied homework

Tags:

bash

Sometimes my students try to submit identical files for their homework. If they did their homework themselves, it would be impossible for any two files to be the exactly the same.

I put the homework in folders arranged like this: /section/id/

In this way, each section of the course has its own folder, each student has their own folder, and all of the files are within that last level. The student files come in a variety of formats.

  • How can I check if there are any exactly identical files (ignoring file names) within any sub-folder?
like image 671
Village Avatar asked Dec 07 '22 17:12

Village


2 Answers

This can help you identify exact same files from your students using the following for loop and awk one-liner:

Step: 1 - for i in path/to/files; do cksum "$i"; done > cksum.txt
Step: 2 - awk 'NR==FNR && a[$1]++ { b[$1]; next } $1 in b' cksum.txt cksum.txt

Test:

Some sample files in which student 2 has used identical file as student 1

[jaypal:~/Temp/homework] ls -lrt
total 32
-rw-r--r--  1 jaypalsingh  staff  10 17 Dec 17:58 student1
-rw-r--r--  1 jaypalsingh  staff  10 17 Dec 17:58 student2
-rw-r--r--  1 jaypalsingh  staff  10 17 Dec 17:58 student3
-rw-r--r--  1 jaypalsingh  staff  10 17 Dec 17:58 student4
[jaypal:~/Temp/homework] cat student1 
homework1
[jaypal:~/Temp/homework] cat student2 
homework1
[jaypal:~/Temp/homework] cat student3 
homework3
[jaypal:~/Temp/homework] cat student4 
homework4

Step 1:

Create a cksum.txt file using the cksum utility

[jaypal:~/Temp/homework] for i in *; do cksum "$i"; done > cksum.txt
[jaypal:~/Temp/homework] cat cksum.txt 
4294967295 0 cksum.txt
1271506813 10 student1
1271506813 10 student2
1215889011 10 student3
1299429862 10 student4

Step 2:

Using awk one-liner identify all files that are same

[jaypal:~/Temp/homework] awk 'NR==FNR && a[$1]++ { b[$1]; next } $1 in b' cksum.txt cksum.txt 
1271506813 10 student1
1271506813 10 student2 

Test 2:

[jaypal:~/Temp/homework] for i in stu*; do cksum "$i"; done > cksum.txt
[jaypal:~/Temp/homework] awk 'NR==FNR && a[$1]++ { b[$1]; next } $1 in b' cksum.txt cksum.txt 
1271506813 10 student1
1271506813 10 student2
1271506813 10 student5
[jaypal:~/Temp/homework] cat student5
homework1
like image 78
jaypal singh Avatar answered Jan 02 '23 23:01

jaypal singh


Create an md5 of all the files and insert them into a dictionary.

like image 34
Betty Avatar answered Jan 02 '23 21:01

Betty