Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rails ActiveStorage attachment to existing S3 file

I'm building a PDF parser that fires off a Sidekiq worker to OCR parse data from a document stored in S3. After parsing, the data is stored in the Document model.

How do I append the existing S3 bucket file to Document.attachment.attach in ActiveStorage without duplicating the file (via File.open, etc...) in S3?

like image 382
Shelby S Avatar asked Sep 14 '18 01:09

Shelby S


2 Answers

This can be done with a slight manipulation of the blob after it is created.

storage.yml

amazon:
  service: S3
  access_key_id: <%= ENV['AWS_ACCESS_KEY_ID'] %>
  secret_access_key: <%= ENV['AWS_SECRET_ACCESS_KEY'] %>
  region: <%= ENV['AWS_REGION'] %>
  bucket: <%= ENV['S3_BUCKET'] %>

app/models/document.rb

class Document < ApplicationRecord
  has_one_attached :pdf
end

rails console

key = "<S3 Key of the existing file in the same bucket that storage.yml uses>"

# Create an active storage blob that will represent the file on S3
params = { 
  filename: "myfile.jpg", 
  content_type:"image/jpeg", 
  byte_size:1234, 
  checksum:"<Base 64 encoding of the MD5 hash of the file's contents>" 
}
blob = ActiveStorage::Blob.create_before_direct_upload!(params)

# By default, the blob's key (S3 key, in this case) a secure (random) token
# However, since the file is already on S3, we need to change the 
# key to match our file on S3
blob.update_attributes key:key

# Now we can create a document object connected to your S3 file
d = Document.create! pdf:blob.signed_id

# in your view, you can now use
url_for d.pdf

At this point, you can use the pdf attribute of your Document object like any other active storage attachment.

like image 195
Troy Avatar answered Nov 17 '22 18:11

Troy


Troy's answer worked great for me! I also found it helpful to pull the metadata about the object from the s3 instance of the object. Something like:

s3 = Aws::S3::Resource.new(region: "us-west-1")
obj = s3.bucket("my-bucket").object("myfile.jpg")    

params = {
    filename: obj.key, 
    content_type: obj.content_type, 
    byte_size: obj.size, 
    checksum: obj.etag.gsub('"',"")
}

I only have 46 points so I left this as an answer instead of a comment :/

like image 35
michaelmedford Avatar answered Nov 17 '22 20:11

michaelmedford