Uniquely identify a file without downloading

Question

So the base project is this...

I am trying to write a server application that will download and hash files off of a website.

The reason for this is so that I can blacklist particular files that are re-uploaded under different names or provide further descriptions as to what a file really is. These files are 0.1KB - 10.00MB and many. If I could detect within a reasonable ballpark figure a file is already hashed I could return the hash rather then download the entire file and send the results.

My temporary solution is a JavaScript add-on that does it on the spot. This causes temporary freezes and is too redundant for my liking. My goal is to make this good enough to share with the public; The current method is far from.

My skill set in programming is very wide yet not professional or polished in any individual so a library or examples are highly appreciated.

A snip-it of my java-script code is this...

    $('.tablesorter tbody tr').each(function(index) {
        var href = 'http:' + $(this).find("td a:eq(0)").attr('href');
        var MD5  = "";
        $.get(href, function(data) {
            MD5 = calcMD5(data);
            $(".tablesorter tbody tr:eq("+index+") td:eq(3)").text(MD5); 
         });
    });

This works great, does what it needs to. However I'd like to have a server do this so that a file only needs to be hashed a single time.

niko · Accepted Answer

Assuming that your problem is that you want to minimize the amount of bandwidth used, you could limit the amount of data downloaded to, say, the first 100kb and build your hash over that part. Other information you could use is anything sent in the header by the server, for example the total filesize and the MIME-filetype.

Obviously this won't work if the files you are expecting to look at differ at parts later in the file. But it should work with images or other compressed file formats.

Uniquely identify a file without downloading

Tags:

python

javascript

php

DeusAphor

1 Answers

niko

Recent Activity

Donate For Us

Uniquely identify a file without downloading

Tags:

python

javascript

php

DeusAphor

1 Answers

niko

Related questions

Recent Activity

Donate For Us