Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rails: Preventing Duplicate Photo Uploads with Paperclip?

Is there anyway to throw a validation error if a user tries to upload the same photo twice to a Rails app using Paperclip? Paperclip doesn't seem to offer this functionality...

I'm using Rails 2.3.5 and Paperclip (obviously).


SOLUTION: (or one of them, at least)

Using Beerlington's suggestion, I decided to go with an MD5 Checksum comparison:

class Photo < ActiveRecord::Base
  #...
  has_attached_file :image #, ...

  before_validation_on_create :generate_md5_checksum
  validate :unique_photo
  #...

  def generate_md5_checksum
    self.md5_checksum = Digest::MD5.hexdigest(image.to_file.read)
  end

  def unique_photo
    photo_digest = self.md5_checksum
    errors.add_to_base "You have already uploaded that file!" unless User.find(self.user_id).photos.find_by_md5_checksum(photo_digest).nil?
  end

  # ...
end

Then I just added a column to my photos table called md5_checksum, and voila! Now my app throws a validation error if you try to upload the same photo!

No idea how efficient/inefficient this is, so refactoring's welcome!

Thanks!

like image 448
neezer Avatar asked Mar 16 '10 20:03

neezer


3 Answers

What about doing an MD5 on the image file? If it is the exact same file, the MD5 hash will be the same for both images.

like image 156
Peter Brown Avatar answered Nov 15 '22 19:11

Peter Brown


For anyone else trying to do this. Paperclip now has md5 hashing built in. If you have a [attachment]_fingerprint in your model, paperclip will populate this with the MD5.

Since I already had a column named hash_value, I made a 'virtual' attribute called fingerprint

#Virtual attribute to have paperclip generate the md5
def picture_fingerprint
  self.hash_value
end

def picture_fingerprint=(md5Hash)
  self.hash_value=md5Hash
end

And, with rails3, using sexy_validations, I was able to simply add this to the top my my model to ensure that the hash_value is unique before it saves the model:

validates :hash_value, :uniqueness => { :message => "Image has already been uploaded." }
like image 42
Howler Avatar answered Nov 15 '22 19:11

Howler


You might run into a problem when your images have amended EXIF metadata. This happened to me, and I had to extract pixel values and calculate MD5s out of them, to ignore changes made by Wordpress etc. You can read about it on our blog: http://www.amberbit.com/blog/2013/12/20/similar-images-detection-in-ruby-with-phash/ but essentially you want to get the pixel data out of image with some tool (like RMagick), concatinate it to string, and calculate MD5 out of that.

like image 27
user3143898 Avatar answered Nov 15 '22 20:11

user3143898