I'm trying to download a PDF from an email and write the contents to a file. For some reason, I'm getting this error:
An Encoding::UndefinedConversionError occurred in attachments#inbound: "\xE2" from ASCII-8BIT to UTF-8 app/controllers/api/attachments_controller.rb:70:in `write'
Here's my code:
def inbound
if Rails.env.production? or Rails.env.staging?
email = Postmark::Mitt.new(request.body.read)
else
email = Postmark::Mitt.new(File.binread "#{Rails.root}/app/temp_pdfs/email.json")
end
if email.attachments.count == 0
# notify aidin that we got an inbound email with no attachments
respond_to do |format|
format.json { head :no_content }
end
return
end
attachment = email.attachments.first
filename = "attachment" + (Time.now.strftime("%Y%m%d%H%M%S")+(rand * 1000000).round.to_s) + ".pdf"
base_path = "#{Rails.root}/temp_attachments/"
unless File.directory?(base_path)
Dir::mkdir(base_path)
end
file = File.new base_path + filename, 'w+'
file.write Base64.decode64(attachment.source['Content'].encode("UTF-16BE", :invalid=>:replace, :replace=>"?").encode("UTF-8"))
file.close
write_options = write_options()
write_options[:metadata] = {:filename => attachment.file_name, :content_type => attachment.content_type, :size => attachment.size }
obj = s3_object()
file = File.open file.path
obj.write(file.read, write_options)
file.close
FaxAttach.trigger obj.key.split('/').last
render :nothing => true, :status => 202 and return
end
I read around and it looked like the way to solve this was:
file.write Base64.decode64(attachment.source['Content'].encode("UTF-16BE", :invalid=>:replace, :replace=>"?").encode("UTF-8"))
but it doesn't seem to work.
You can read any ASCII-encoded document as UTF-8, and it will work. ASCII only uses 7 bits, and UTF-8 uses the unused eight bit to mark non-ASCII code units.
UTF-8 is an encoding system for Unicode. It can translate any Unicode character to a matching unique binary string, and can also translate the binary string back to a Unicode character. This is the meaning of “UTF”, or “Unicode Transformation Format.”
ASCII is an 8-bit code. That is, it uses eight bits to represent a letter or a punctuation mark. Eight bits are called a byte. A binary code with eight digits, such as 1101 10112, can be stored in one byte of computer memory.
There's no difference between ASCII and UTF-8 when storing digits. A tighter packing would be using 4 bits per digit (BCD). If you want to go below that, you need to take advantage of the fact that long sequences of 10-base values can be presented as 2-base (binary) values. Save this answer.
The error message is actually being thrown on the file write, not by your encode/decode inside the params, because Ruby is trying to apply default character encoding on file.write
. To prevent this, the quickest fix is to add the b
flag when you open the file
file = File.new base_path + filename, 'wb+'
file.write Base64.decode64( attachment.source['Content'] )
That's assuming the incoming attachment is encoded in Base64, as your code implies (I have no way to verify this). The Base64 encoding stored inside attachment.source['Content']
should be the same bytes in ASCII-8BIT and UTF-8, so there is no point converting it inside the call to decode64
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With