Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Paperclip, Delayed Job, S3, Heroku - design for delayed processing of sensitive uploaded files: db or s3?

I need feedback on the design for uploading and delayed processing of a file using heroku, paperclip, delayed job and, if necessary, s3. Parts of it have been discussed in other places but I couldn't find a complete discussion anywhere.

Task description:

  1. Upload file (using paperclip to s3/db on heroku). File needs to be private as it contains sensitive data.
  2. Queue file for processing (delayed job)
  3. Job gets run in queue
  4. File is retrieved (from s3/db), and processing is completed
  5. File is deleted (from s3/db)

Since I am using delayed job, I have to decide between storing the file in the database or on s3. I am assuming that storing the file on the web server is out of the question as I am using heroku and delayed job. Uploading files to s3 takes a long time. But, storing files in db is more expensive. Ideally, we would want the processing to finish as quickly as possible.

What is the more common design pattern? Store files on s3? Store files in db? Any particular recommended gems used to retrieve and process files stored in s3 (aws-s3? s3?)?

like image 865
user1094320 Avatar asked Dec 12 '11 18:12

user1094320


2 Answers

Heroku has a timeout of 30 seconds on any server request (learnt the hard way), so definitely storing files on s3 is a must.

Try carrierwave (carrierwave railscasts) instead of paperclip, as I prefer the added helpers that come onboard, plus there a number of great plugins, like carrierwave_direct for uploading large files to s3, which integrate nicely with carrierwave.

Delayed_job (railscasts - delayed_job) will work nicely for deleting files from s3 and any other background processing that may be required.

My gem file includes the following:

gem 'delayed_job'

gem "aws-s3", :require => 'aws/s3'

gem 'fog'

gem 'carrierwave'

gem 'carrierwave_direct'

fog gem is a nice way to have all your account info in a single place and sets up everything quite nicely. For the AWS gem how-to, good resource.

Here is a sample controller when submitting a form to upload (there are definitely better ways of doing this, but for illustrative purposes)

def create
    @asset = Asset.new(:description => params[:description], :user_id => session[:id], :question_id => @question.id)
    if @asset.save && @asset.update_attributes(:file_name => sanitize_filename(params[:uploadfile].original_filename, @asset.id))
        AWS::S3::S3Object.store(sanitize_filename(params[:uploadfile].original_filename, @asset.id), params[:uploadfile].read, 'bucket_name', :access => :private, :content_type => params[:uploadfile].content_type)
            if object.content_length.to_i < @question.emailatt.to_i.megabytes && object.content_length.to_i < 5.megabytes
                url = AWS::S3::S3Object.url_for(sanitize_filename(params[:uploadfile].original_filename, @asset.id), 'bucket_name')
                if @asset.update_attributes(:download_link => 1)
                    if Usermailer.delay({:run_at => 5.minutes.from_now}).attachment_user_mailer_download_notification(@asset, @question)
                        process_attachment_user_mailer_download(params[:uploadfile], @asset.id, 24.hours.from_now, @question.id)
                        flash[:notice] = "Thank you for the upload, we will notify this posts author"
                    end
                end
            end
    else
        @asset.destroy
        flash[:notice] = "There was an error in processing your upload, please try again"
        redirect_to(:controller => "questions", :action => "show", :id => @question.id)
    end
end


private

    def sanitize_filename(file_name, id)
        just_filename = File.basename(file_name)
        just_filename.sub(/[^\w\.\-]/,'_')
        new_id = id.to_s
        new_filename = "#{new_id}" + just_filename
    end

    def delete_process(uploadfile, asset_id, time, question_id)
        asset = Asset.find(:first, :conditions => ["id = ?", asset_id])
        if delete_file(uploadfile, asset_id, time) && asset.destroy
            redirect_to(:controller => "questions", :action => "show", :id => question_id)
        end
    end


def process_attachment_user_mailer_download(uploadfile, asset_id, time, question_id)
        asset = Asset.find(:first, :conditions => ["id = ?", asset_id])
        if delete_file(uploadfile, asset_id, time) && @asset.delay({:run_at => time}).update_attributes(:download_link => 0)
            redirect_to(:controller => "questions", :action => "show", :id => question_id)
        end
    end

    #S3 METHODS FOR CREATE ACTION

    #deletes the uploaded file from s3
    def delete_file(uploadfile, asset_id, time)
        AWS::S3::S3Object.delay({:run_at => time}).delete(sanitize_filename(uploadfile.original_filename, asset_id), 'bucket_name')
    end

Lots of unnecessary code, I know (wrote this when I was starting with Rails). Hopefully it will give some idea of the processes involved in writing this type of app. Hope it helps.

like image 193
Hishalv Avatar answered Nov 02 '22 23:11

Hishalv


For my part I'm using :

  • Delayed Job
  • Paperclip
  • Delayed Paperclip which uploads the original file on S3 and create a delayed job with the custom post processing. It can add a column to you model stating that the file is being processed.

Only a few lines to set up. And you can do a lot with paperclip interpolations and generators.

like image 42
rnaud Avatar answered Nov 03 '22 01:11

rnaud