Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate MD5 hash for a file located in a Http Url?

Tags:

c#

md5

I am writing a web crawler to search for files and download. My problem is I do not want to download the same files that are downloaded already to the local drive. I know it's possible to use the MD5 hash to compare but how can I do this on HTTP URL without downloading them to the local disk?

If this approach is wrong. Please advice on a better solution

like image 485
kakopappa Avatar asked Jul 11 '11 14:07

kakopappa


People also ask

How do I get the MD5 hash of a file?

Type the following command: md5sum [type file name with extension here] [path of the file] -- NOTE: You can also drag the file to the terminal window instead of typing the full path. Hit the Enter key. You'll see the MD5 sum of the file. Match it against the original value.

Which of the following utility creates MD5 hashes for a given file?

In Linux, the md5sum program computes and checks MD5 hash values of a file. It is a constituent of GNU Core Utilities package, therefore comes pre-installed on most, if not all Linux distributions.

Can 2 files have the same MD5?

Generally, two files can have the same md5 hash only if their contents are exactly the same. Even a single bit of variation will generate a completely different hash value. There is one caveat, though: An md5 sum is 128 bits (16 bytes).


2 Answers

Unless the webserver has some sort of service that shares the MD5, then No.

Computing a file hash requires every byte in the file. This is why changing a single byte changes the hash, to prevent getting modified files.

like image 102
Neil N Avatar answered Oct 21 '22 10:10

Neil N


To generate a hash you're going to need the data (ie, you'll need to download it somehow).

I would suggest that you investigate using the If-Modified-Since HTTP header instead (or maybe ETag/If-None-Match, if the particular server provides it).

like image 32
LukeH Avatar answered Oct 21 '22 11:10

LukeH