I am trying to read files from a directory which contains many sub directories. The data is in S3 and I am trying to do this:
val rdd =sc.newAPIHadoopFile(data_loc,
classOf[org.apache.hadoop.mapreduce.lib.input.TextInputFormat],
classOf[org.apache.hadoop.mapreduce.lib.input.TextInputFormat],
classOf[org.apache.hadoop.io.NullWritable])
this does not seem to work.
Appreciate the help
yes it works, took a while to get the individual blocks/splits though , basically a specific directory in every sub directory :
s3n://bucket/root_dir/*/data/*/*/*
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With