So the base project is this...
I am trying to write a server application that will download and hash files off of a website.
The reason for this is so that I can blacklist particular files that are re-uploaded under different names or provide further descriptions as to what a file really is. These files are 0.1KB - 10.00MB and many. If I could detect within a reasonable ballpark figure a file is already hashed I could return the hash rather then download the entire file and send the results.
My temporary solution is a JavaScript add-on that does it on the spot. This causes temporary freezes and is too redundant for my liking. My goal is to make this good enough to share with the public; The current method is far from.
My skill set in programming is very wide yet not professional or polished in any individual so a library or examples are highly appreciated.
A snip-it of my java-script code is this...
$('.tablesorter tbody tr').each(function(index) {
var href = 'http:' + $(this).find("td a:eq(0)").attr('href');
var MD5 = "";
$.get(href, function(data) {
MD5 = calcMD5(data);
$(".tablesorter tbody tr:eq("+index+") td:eq(3)").text(MD5);
});
});
This works great, does what it needs to. However I'd like to have a server do this so that a file only needs to be hashed a single time.
Assuming that your problem is that you want to minimize the amount of bandwidth used, you could limit the amount of data downloaded to, say, the first 100kb and build your hash over that part. Other information you could use is anything sent in the header by the server, for example the total filesize and the MIME-filetype.
Obviously this won't work if the files you are expecting to look at differ at parts later in the file. But it should work with images or other compressed file formats.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With