EntityTooLarge error when uploading a 5G file to Amazon S3

Tags:

Amazon S3 file size limit is supposed to be 5T according to this announcement, but I am getting the following error when uploading a 5G file

'/mahler%2Fparquet%2Fpageview%2Fall-2014-2000%2F_temporary%2F_attempt_201410112050_0009_r_000221_2222%2Fpart-r-222.parquet' XML Error Message: 
  <?xml version="1.0" encoding="UTF-8"?>
  <Error>
    <Code>EntityTooLarge</Code>
    <Message>Your proposed upload exceeds the maximum allowed size</Message>
    <ProposedSize>5374138340</ProposedSize>
    ...
    <MaxSizeAllowed>5368709120</MaxSizeAllowed>
  </Error>

This makes it seem like S3 is only accepting 5G uploads. I am using Apache Spark SQL to write out a Parquet data set using SchemRDD.saveAsParquetFile method. The full stack trace is

org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 PUT failed for '/mahler%2Fparquet%2Fpageview%2Fall-2014-2000%2F_temporary%2F_attempt_201410112050_0009_r_000221_2222%2Fpart-r-222.parquet' XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>EntityTooLarge</Code><Message>Your proposed upload exceeds the maximum allowed size</Message><ProposedSize>5374138340</ProposedSize><RequestId>20A38B479FFED879</RequestId><HostId>KxeGsPreQ0hO7mm7DTcGLiN7vi7nqT3Z6p2Nbx1aLULSEzp6X5Iu8Kj6qM7Whm56ciJ7uDEeNn4=</HostId><MaxSizeAllowed>5368709120</MaxSizeAllowed></Error>
        org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.storeFile(Jets3tNativeFileSystemStore.java:82)
        sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        java.lang.reflect.Method.invoke(Method.java:606)
        org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        org.apache.hadoop.fs.s3native.$Proxy10.storeFile(Unknown Source)
        org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.close(NativeS3FileSystem.java:174)
        org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
        org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
        parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:321)
        parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:111)
        parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73)
        org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:305)
        org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
        org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
        org.apache.spark.scheduler.Task.run(Task.scala:54)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)

Is the upload limit still 5T? If it is why am I getting this error and how do I fix it?

409

asked Oct 11 '14 22:10

Daniel Mahler

1 Answers

If you are using aws cli for the upload, you can use 'aws s3 cp' command so it does not require splitting and multi part upload

aws s3 cp masive-file.ova s3://<your-bucket>/<prefix>/masive-file.ova

answered Sep 19 '22 23:09

Tomasz Swider

Related questions
                            
                                Folder won't delete on Amazon S3
                            
                                Spark jobs finishes but application takes time to close
                            
                                Can i point multiple location to same hive external table?
                            
                                Configuring environment variables for static web site on AWS S3 [closed]
                            
                                Download private file from S3 using bash
                            
                                is Parquet predicate pushdown works on S3 using Spark non EMR?
                            
                                Change S3 Bucket Storage class to S3 Infrequent Access
                            
                                script to download file from Amazon S3 bucket
                            
                                AWS Route 53 integration with Cloudfront error (403)
                            
                                PHP Amazon SDK - s3 putObject and set Body
                            
                                Renaming a file with Amazon S3 PHP SDK
                            
                                How to select a file from aws s3 by using wild character
                            
                                Uploading Base64 encoded image to Amazon s3 using java
                            
                                AWS Lambda: Is it secure to store data on AWS Lambda local Disk?
                            
                                Delete a folder and its content AWS S3 java
                            
                                Stream ffmpeg transcoding result to S3
                            
                                Amazon SimpleDB
                            
                                Is it possible to copy between AWS accounts using AWS CLI?
                            
                                AWS S3 + CloudFront gives CORS errors when serving images from browser cache
                            
                                S3 Static Website Hosting when Bucketname is taken?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

EntityTooLarge error when uploading a 5G file to Amazon S3

Tags:

amazon-s3

apache-spark

apache-spark-sql

parquet

jets3t

Daniel Mahler

People also ask

1 Answers

Tomasz Swider

Recent Activity

Donate For Us