I have a S3 bucket which contains multiples files which have colon within their file names.
Example :
s3://my_bucket/my_data/en/2015120/batch:222:111:00000.jl.gz
I am trying to load this in to a spark RDD and access the first line as follows.
my_data = sc.textFile("s3://my_bucket/my_data/en/2015120/batch:222:111:00000.jl.gz")
my_data.take(1)
But this throws,
llegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI:
Any suggestions to load these files individually or more preferably as the whole folder
I got it to work by replacing the colons to url encoded format.
i.e.
:
would be replaced with %3A
To double check, click on one of the objects in S3 and see the "link"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With